Open asok opened 2 years ago
cf. https://github.com/ruby/webrick/issues/110, especially this comment https://github.com/ruby/webrick/issues/110#issuecomment-1436135222.
This is because URI doesn't support RFC 3987 (Internationalized Resource Identifier (IRI)).
No, a URI path is not allowed to contain arbitrary UTF-8 characters. Non-ASCII UTF-8 characters must be percent encoded, and even some ASCII characters must be percent encoded. It's true that the URI library doesn't support IRIs. That's not a bug, there should probably be a separate library used for IRIs.
IRIs have not been integrated into URIs to keep the retro-compatibility. But IRI is extending URI.
IRIs are meant to replace URIs in identifying resources for protocols, formats, and software components that use a UCS-based character repertoire.
Ruby has a huge Unicode support (in strings, regexp, etc.) so not supporting Unicode in uri module is an exception.
If one does not want to change the behavior of the default parse
method, maybe the uri module could include a :unicode
/ :iri
or whatever option to the parse
method or an alternative method parse_iri
that would accept an IRI and map it to a URI then pass the resulting URI to the classic parse
method than handle only ASCII URI. rfc 3987 explains how to map IRI to URI and URI to IRI.
As IRI is extending URI and deeply linked to it I would more see IRI support integrated in new methods in the URI module rather than having a separate module only for URI. But that's just my POV and I may not be the better suited nor more experienced here.
That's not a bug
I agree, that more a feature request to support modern usage where Unicode is widely spread and massively democratized.
Just ran into this today... noraj's comments above seem spot-on to me.
Hi, I'm getting such error:
I thought that the path component is allowed to contain any UTF-8 character.