Closed Perlbotics closed 2 years ago
Thanks, @Perlbotics! We'll need to fix this. If you'd like to take that on, just let us know.
Hi @oalders, I am not (too) familiar with the inner workings of this module. So, I tried to fix my problem and it seems that _server.pm is the right place. I made it work for the constructor part and some of the setters w/o breaking the test suite (while detecting that something seems to be wrong with '?' and '#' being percent encoded - but perhaps, I used the API wrong?).
However, this quick fix is kind of inverting the problem space by fixing everything except the host part while it is the host part that would need special treatment.
URI.pm assumes that a reserved character is reserved everywhere in the URI string, but the square brackets are the first set of characters that are restricted to the host part only to mark IPv6 and future versions while everywhere else, they need to be percent encoded.
Consequently, '[' and ']' should be removed from $URI::reserved but that will break a lot of test cases. These could be fixed for the URI module, but I fear to break legacy code and modules depending on URI.
How to handle this situation? Feature flags, restrict scope to _generic.pm or _server.pm, blindly enforce the standard, etc. ?
One could split $URI::reserved into $URI::reserved (w/o square brackets) and $URI::reserved_for_host in a first step and then treat them differently for the host part. Swapping '[' and ']' between these strings would allow an easy fallback to the previous behavior while adding a bit more complexity to the code base.
What is your advice here?
Thanks.
One could split $URI::reserved into $URI::reserved (w/o square brackets) and $URI::reserved_for_host in a first step and then treat them differently for the host part.
This sounds like a good idea to me. That would be a good first step. We could also add some tests for downstream dependencies to the GitHub workflow to ensure that CPAN modules with lots of dependencies are not affected.
Hi Olaf( @oalders). I've forked URI and provided a patch in branch bug99. The basic approach was to escape all '[' and ']' first and then undo again in the host part of the authority section. It was possible to do this in URI.pm so it should have a broad coverage and only t/old-base.t had a single failed test. The old behavior can be restored setting an environment variable (see perdoc of URI.pm). I didn't dare to create a URI::import() to switch this off/on. Also, I wanted to set the variables only once during initialisation since I don't know what side effect a later change on the many /o regex-modifiers has in older perls. Without this "legacy" option, the code would be cleaner. The test coverage is poor, since only new() and canonical() were tested as well as the switch from/to 'legacy' mode. README.MD was also not updated. I havn't run distzilla yet. The documentation might need a review too since I am not a native speaker. At least a first step has been made.
Thanks, @Perlbotics! If you create a pull request, it will be a touch easier to review and discuss.
README.MD was also not updated. I havn't run distzilla yet.
That will all happen automatically at release time, so it's actually easier to review the diff if you don't run dzil
on your changes. 😄
At least a first step has been made.
🎊
The documentation might need a review too since I am not a native speaker.
Not to worry. Even native speakers need their docs reviewed!
Closed via #100
URI version: 5.10 (also tested: 5.05, 1.73)
Expecting:
https://www.example.com/xy%5Bz%5Dabc
RFC 3986 - 3.2.2 Host ... A host identified by an Internet Protocol literal address, version 6 [RFC3513] or later, is distinguished by enclosing the IP literal within square brackets ("[" and "]"). This is the only place where square bracket characters are allowed in the URI syntax. ...
RFC 3986 - Appendix D1: Additions ... Square brackets are now specified as reserved within the authority component and are not allowed outside their use as delimiters for an IP literal within host. ...
Hope, I have not misinterpreted the RFC. Thank you for consideration.