haskell / network-uri

URI manipulation facilities
Other
24 stars 33 forks source link

"\65535" not handled by escape/unescape #51

Closed ezrakilty closed 4 years ago

ezrakilty commented 4 years ago

As reported by a random test failure, https://travis-ci.org/haskell/network-uri/jobs/652033164, the string "\65535" is apparently not correctly handled in an escape/unescape round-trip. It's a bit of an edge case, for sure, but let's track it down.

ezrakilty commented 4 years ago

This is a can of worms. The URI spec actually only defines percent-encoding for octets, not any wider set of characters. It defers to other definitions to specify character encodings: §1.2.1 says, "Such a definition should specify the character encoding used to map those characters to octets prior to being percent-encoded for the URI."

However, the Network.URI implementation currently does UTF-8-encoding before performing percent-encoding, so it already takes an opinion on what is supposed to be left to other layers.

ezrakilty commented 4 years ago

Ultimately, I determined that the codepoint in question, \65535, is a "noncharacter" in Unicode and the utf8 implementation in this module encodes it as an error (using the "replacement character" as specified). I haven't found definitive spec text that tells me whether it's an error or not; but I decided it's OK for the implementation to encode it as an error and changed the test to not generate that input. The same is true of \65534, which is also a noncharacter.

nomeata commented 2 years ago

I came across the same test failure and found this issue. Has at least the test suite been fixed to not randomly fail, @ezrakilty?

ezrakilty commented 2 years ago

Yes, the failure I hit was fixed with commit https://github.com/haskell/network-uri/commit/7f79d446cae44c9c99a61aafa8d30dacbd5bfe50

What did you hit? Perhaps it was a different test?

nomeata commented 2 years ago

Probably just an old version. All good!