Closed yescallop closed 7 months ago
The first item is fixed by https://github.com/lo48576/iri-string/commit/67f1e348cfc647f51ea680e7aa223f0eee30767a.
When an IRI uses components of the generic syntax, the component syntax equivalence rules always apply; namely, that the scheme and US-ASCII only host are case insensitive and therefore should be normalized to lowercase. (snip) Case equivalence for non-ASCII characters in IRI components that are IDNs are discussed in section 5.3.3.
https://github.com/lo48576/iri-string/blob/021fce896eed51d388161170fdd30c82cd664ba8/src/normalize.rs#L513-L521 https://github.com/lo48576/iri-string/blob/021fce896eed51d388161170fdd30c82cd664ba8/src/parser/trusted.rs#L465-L470
So... Is %99B
"US-ASCII only"?
The current code considers it's not, because 0x99 is not a valid US-ASCII character.
I don't remember how non-decoded percent-encoding should be handled, so I need to do more research.
I see why our implementations differ in the second case now. I haven't taken a proper look at RFC 3987 and wrote my code based solely on RFC 3986 which only says "the scheme and host are case-insensitive" instead of "the scheme and US-ASCII only host are case insensitive". I also have no idea what "US-ASCII only" means in that context.
That said, I have just found another case which I guess is a bug: Normalizing "a:/%92%99"
yields "a:/%92"
, in which %99
is omitted for some reason.
The a:/%92%99
case is fixed by https://github.com/lo48576/iri-string/commit/547f0af8bb109426f94c98c1e9615b6053521470.
Released v0.7.2 with the fixes for obviously wrong resolution/normalization. Thank you for reporting!
I created a separate issue #38 for a:/%92%99
case, so I'll close this issue as fixed.
Hi, I found two possible issues with
iri-string
through differential fuzzing:"/.//."
against"a:/"
(or".//."
against"a:/"
) yields"a://"
. Normalizing"a:/.//."
yields"a://"
. Both results are supposed to be"a:/.//"
IIUC."a://%99B/"
yields"a://%99B/"
. The result is supposed to be"a://%99b/"
IIUC.The fuzz targets used are: resolve_against_iri_string.rs, normalize_against_iri_string.rs
Note that you would need to uncomment the three lines in
normalize_against_iri_string.rs
to reveal the second possible issue.