libwww-perl / URI

The Perl URI module
https://metacpan.org/pod/URI
Other
41 stars 48 forks source link

Square brackets in path element not escaped. #99

Closed Perlbotics closed 2 years ago

Perlbotics commented 2 years ago

URI version: 5.10 (also tested: 5.05, 1.73)

perl -MURI -E 'say URI->new("https://www.example.com/xy[z]abc")->canonical'
https://www.example.com/xy[z]abc

Expecting: https://www.example.com/xy%5Bz%5Dabc

RFC 3986 - 3.2.2 Host ... A host identified by an Internet Protocol literal address, version 6 [RFC3513] or later, is distinguished by enclosing the IP literal within square brackets ("[" and "]"). This is the only place where square bracket characters are allowed in the URI syntax. ...

RFC 3986 - Appendix D1: Additions ... Square brackets are now specified as reserved within the authority component and are not allowed outside their use as delimiters for an IP literal within host. ...

Hope, I have not misinterpreted the RFC. Thank you for consideration.

oalders commented 2 years ago

Thanks, @Perlbotics! We'll need to fix this. If you'd like to take that on, just let us know.

Perlbotics commented 2 years ago

Hi @oalders, I am not (too) familiar with the inner workings of this module. So, I tried to fix my problem and it seems that _server.pm is the right place. I made it work for the constructor part and some of the setters w/o breaking the test suite (while detecting that something seems to be wrong with '?' and '#' being percent encoded - but perhaps, I used the API wrong?).

However, this quick fix is kind of inverting the problem space by fixing everything except the host part while it is the host part that would need special treatment.

URI.pm assumes that a reserved character is reserved everywhere in the URI string, but the square brackets are the first set of characters that are restricted to the host part only to mark IPv6 and future versions while everywhere else, they need to be percent encoded.

Consequently, '[' and ']' should be removed from $URI::reserved but that will break a lot of test cases. These could be fixed for the URI module, but I fear to break legacy code and modules depending on URI.

How to handle this situation? Feature flags, restrict scope to _generic.pm or _server.pm, blindly enforce the standard, etc. ?

One could split $URI::reserved into $URI::reserved (w/o square brackets) and $URI::reserved_for_host in a first step and then treat them differently for the host part. Swapping '[' and ']' between these strings would allow an easy fallback to the previous behavior while adding a bit more complexity to the code base.

What is your advice here?

Thanks.

oalders commented 2 years ago

One could split $URI::reserved into $URI::reserved (w/o square brackets) and $URI::reserved_for_host in a first step and then treat them differently for the host part.

This sounds like a good idea to me. That would be a good first step. We could also add some tests for downstream dependencies to the GitHub workflow to ensure that CPAN modules with lots of dependencies are not affected.

Perlbotics commented 2 years ago

Hi Olaf( @oalders). I've forked URI and provided a patch in branch bug99. The basic approach was to escape all '[' and ']' first and then undo again in the host part of the authority section. It was possible to do this in URI.pm so it should have a broad coverage and only t/old-base.t had a single failed test. The old behavior can be restored setting an environment variable (see perdoc of URI.pm). I didn't dare to create a URI::import() to switch this off/on. Also, I wanted to set the variables only once during initialisation since I don't know what side effect a later change on the many /o regex-modifiers has in older perls. Without this "legacy" option, the code would be cleaner. The test coverage is poor, since only new() and canonical() were tested as well as the switch from/to 'legacy' mode. README.MD was also not updated. I havn't run distzilla yet. The documentation might need a review too since I am not a native speaker. At least a first step has been made.

oalders commented 2 years ago

Thanks, @Perlbotics! If you create a pull request, it will be a touch easier to review and discuss.

README.MD was also not updated. I havn't run distzilla yet.

That will all happen automatically at release time, so it's actually easier to review the diff if you don't run dzil on your changes. 😄

At least a first step has been made.

🎊

oalders commented 2 years ago

The documentation might need a review too since I am not a native speaker.

Not to worry. Even native speakers need their docs reviewed!

oalders commented 2 years ago

Closed via #100