lo48576 / iri-string

String types for URIs/IRIs.
Apache License 2.0
18 stars 3 forks source link

Possible issues with reference resolution and normalization #36

Closed yescallop closed 7 months ago

yescallop commented 7 months ago

Hi, I found two possible issues with iri-string through differential fuzzing:

The fuzz targets used are: resolve_against_iri_string.rs, normalize_against_iri_string.rs

Note that you would need to uncomment the three lines in normalize_against_iri_string.rs to reveal the second possible issue.

lo48576 commented 7 months ago

The first item is fixed by https://github.com/lo48576/iri-string/commit/67f1e348cfc647f51ea680e7aa223f0eee30767a.

lo48576 commented 7 months ago

When an IRI uses components of the generic syntax, the component syntax equivalence rules always apply; namely, that the scheme and US-ASCII only host are case insensitive and therefore should be normalized to lowercase. (snip) Case equivalence for non-ASCII characters in IRI components that are IDNs are discussed in section 5.3.3.

—— RFC 3987, §5.3.2.1 Case Normalization

https://github.com/lo48576/iri-string/blob/021fce896eed51d388161170fdd30c82cd664ba8/src/normalize.rs#L513-L521 https://github.com/lo48576/iri-string/blob/021fce896eed51d388161170fdd30c82cd664ba8/src/parser/trusted.rs#L465-L470

So... Is %99B "US-ASCII only"? The current code considers it's not, because 0x99 is not a valid US-ASCII character. I don't remember how non-decoded percent-encoding should be handled, so I need to do more research.

yescallop commented 7 months ago

I see why our implementations differ in the second case now. I haven't taken a proper look at RFC 3987 and wrote my code based solely on RFC 3986 which only says "the scheme and host are case-insensitive" instead of "the scheme and US-ASCII only host are case insensitive". I also have no idea what "US-ASCII only" means in that context.

That said, I have just found another case which I guess is a bug: Normalizing "a:/%92%99" yields "a:/%92", in which %99 is omitted for some reason.

lo48576 commented 7 months ago

The a:/%92%99 case is fixed by https://github.com/lo48576/iri-string/commit/547f0af8bb109426f94c98c1e9615b6053521470.

lo48576 commented 7 months ago

Released v0.7.2 with the fixes for obviously wrong resolution/normalization. Thank you for reporting!

lo48576 commented 7 months ago

I created a separate issue #38 for a:/%92%99 case, so I'll close this issue as fixed.