libwww-perl / URI

The Perl URI module
https://metacpan.org/pod/URI
Other
57 stars 48 forks source link

fix userinfo in case of some special characters #144

Open kastakhov opened 3 months ago

kastakhov commented 3 months ago

Fix for #143

oalders commented 3 months ago

Thanks very much for this @kastakhov. If we could tweak the regex in both places as suggested, that would be helpful.

kastakhov commented 3 months ago

Hi, @oalders, @genio, any another suggestions for these changes or now it looks good for you?

SineSwiper commented 2 months ago

This is very adjacent to this issue, and feel free to tell me to open a different issue/PR, but...

The user/password methods allow bad data to pollute the URL (lines 45 and the similar line 22), because it is only doing a very light set of manual URL-escaping. It should, at a minimum escape potential @ characters in the password ($new =~ s/@/%40/g), and at a maximum, just call uri_escape for both user/password sets.

oalders commented 2 months ago

Is there a good reason not to just call uri_escape?

genio commented 2 months ago

I still don't think this is a good idea, as you're supposed to URI-escape things. Here's an example with both Mojo and URI:

#!/usr/bin/env perl

use v5.36;
use strict;
use warnings;

use Mojo::URL ();
use Mojo::Util ();
use URI ();
use URI::Escape ();

my $unescaped_u = 'u1!"#$%&\'()*+,-./;<=>?@[\]^_`{|}~';
my $unescaped_p = 'p1!"#$%&\'()*+,-./;<=>?@[\]^_`{|}~';

{
    # mojolicious way:
    my $username = Mojo::Util::url_escape($unescaped_u);
    my $password = Mojo::Util::url_escape($unescaped_p);
    my $foo = "http://${username}:${password}\@example.com/pub/a/2001/08/27/bjornstad.html";
    # say "Mojo: ", $foo;
    my $url = Mojo::URL->new($foo);
    # say "Mojo: ", $url->to_string();
    say "Mojo: ", $url->userinfo();
}

{
    # URI way
    my $username = URI::Escape::uri_escape($unescaped_u);
    my $password = URI::Escape::uri_escape($unescaped_p);
    my $foo = "http://${username}:${password}\@example.com/pub/a/2001/08/27/bjornstad.html";
    # say "URI:  ", $foo;
    my $url = URI->new($foo);
    # say "URI:  ", $url->as_string();
    say "URI:  ", URI::Escape::uri_unescape($url->userinfo());
}

Yielding:

% perl url.pl
Mojo: u1!"#$%&'()*+,-./;<=>?@[\]^_`{|}~:p1!"#$%&'()*+,-./;<=>?@[\]^_`{|}~
URI:  u1!"#$%&'()*+,-./;<=>?@[\]^_`{|}~:p1!"#$%&'()*+,-./;<=>?@[\]^_`{|}~
SineSwiper commented 2 months ago

While I agree everybody should URI-escape the UN/PWs, it should still try to safely parse as much as it can. IMO, [^:]+:[^@]+@ should work to capture userinfo without escaping out to parts that it shouldn't.

Also, I'll just create a different issue for the user/password set problem.

SineSwiper commented 2 months ago

Well, disregard my user/password set problem. I realized that userinfo is running through URI::Escape::escape_char($1) and everything works correctly, as long as it didn't start off confused (ie: URI->new with reserved characters in the password).

kastakhov commented 2 months ago

While I agree everybody should URI-escape the UN/PWs, it should still try to safely parse as much as it can. IMO, [^:]+:[^@]+@ should work to capture userinfo without escaping out to parts that it shouldn't.

Also, I'll just create a different issue for the user/password set problem.

Just in case, originally I faced with this issue here..

kastakhov commented 2 months ago

Now the only one issue still here.... when password contain '@' unescaped symbol before '/#?' symbols, user info part matches incorrectly. 🙃

genio commented 2 months ago

The seemingly relevant RFC pieces are linked here: https://datatracker.ietf.org/doc/html/rfc3986#section-3.2.1 https://datatracker.ietf.org/doc/html/rfc3986#section-7.3 https://datatracker.ietf.org/doc/html/rfc3986#section-7.5

I think my cursory reading of these and understanding thus far leads me to believe we should leave things as-is and maybe provide documentation for how to provide complex authentication using URI::Escape::uri_escape. I believe making changes to the behavior here could result in unexpected double percent encoding for some folks who are already doing it the way in the sample script above.

SineSwiper commented 2 months ago

I found Section 2.2 earlier, but didn't see where those declarations were used. Good find.

So, this means we should stay away from unnecessarily parsing gen-delims in the userinfo section. At the very least, .* is way too broad.

Now the only one issue still here.... when password contain '@' unescaped symbol before '/#?' symbols, user info part matches incorrectly.

Again, a delimiter has to exist unescaped. There's no such thing as a parser algorithm that can take in every ASCII or Unicode unescaped character as a sub-field of the whole string. At some point, we have to force the user to escape their data before it gets dropped into a URL.