Open kastakhov opened 3 months ago
Thanks very much for this @kastakhov. If we could tweak the regex in both places as suggested, that would be helpful.
Hi, @oalders, @genio, any another suggestions for these changes or now it looks good for you?
This is very adjacent to this issue, and feel free to tell me to open a different issue/PR, but...
The user/password methods allow bad data to pollute the URL (lines 45 and the similar line 22), because it is only doing a very light set of manual URL-escaping. It should, at a minimum escape potential @
characters in the password ($new =~ s/@/%40/g
), and at a maximum, just call uri_escape
for both user/password sets.
Is there a good reason not to just call uri_escape
?
I still don't think this is a good idea, as you're supposed to URI-escape things. Here's an example with both Mojo and URI:
#!/usr/bin/env perl
use v5.36;
use strict;
use warnings;
use Mojo::URL ();
use Mojo::Util ();
use URI ();
use URI::Escape ();
my $unescaped_u = 'u1!"#$%&\'()*+,-./;<=>?@[\]^_`{|}~';
my $unescaped_p = 'p1!"#$%&\'()*+,-./;<=>?@[\]^_`{|}~';
{
# mojolicious way:
my $username = Mojo::Util::url_escape($unescaped_u);
my $password = Mojo::Util::url_escape($unescaped_p);
my $foo = "http://${username}:${password}\@example.com/pub/a/2001/08/27/bjornstad.html";
# say "Mojo: ", $foo;
my $url = Mojo::URL->new($foo);
# say "Mojo: ", $url->to_string();
say "Mojo: ", $url->userinfo();
}
{
# URI way
my $username = URI::Escape::uri_escape($unescaped_u);
my $password = URI::Escape::uri_escape($unescaped_p);
my $foo = "http://${username}:${password}\@example.com/pub/a/2001/08/27/bjornstad.html";
# say "URI: ", $foo;
my $url = URI->new($foo);
# say "URI: ", $url->as_string();
say "URI: ", URI::Escape::uri_unescape($url->userinfo());
}
Yielding:
% perl url.pl
Mojo: u1!"#$%&'()*+,-./;<=>?@[\]^_`{|}~:p1!"#$%&'()*+,-./;<=>?@[\]^_`{|}~
URI: u1!"#$%&'()*+,-./;<=>?@[\]^_`{|}~:p1!"#$%&'()*+,-./;<=>?@[\]^_`{|}~
While I agree everybody should URI-escape the UN/PWs, it should still try to safely parse as much as it can. IMO, [^:]+:[^@]+@
should work to capture userinfo
without escaping out to parts that it shouldn't.
Also, I'll just create a different issue for the user/password set problem.
Well, disregard my user/password set problem. I realized that userinfo
is running through URI::Escape::escape_char($1)
and everything works correctly, as long as it didn't start off confused (ie: URI->new with reserved characters in the password).
While I agree everybody should URI-escape the UN/PWs, it should still try to safely parse as much as it can. IMO,
[^:]+:[^@]+@
should work to captureuserinfo
without escaping out to parts that it shouldn't.Also, I'll just create a different issue for the user/password set problem.
Just in case, originally I faced with this issue here..
Now the only one issue still here.... when password contain '@' unescaped symbol before '/#?' symbols, user info part matches incorrectly. 🙃
The seemingly relevant RFC pieces are linked here: https://datatracker.ietf.org/doc/html/rfc3986#section-3.2.1 https://datatracker.ietf.org/doc/html/rfc3986#section-7.3 https://datatracker.ietf.org/doc/html/rfc3986#section-7.5
I think my cursory reading of these and understanding thus far leads me to believe we should leave things as-is and maybe provide documentation for how to provide complex authentication using URI::Escape::uri_escape
. I believe making changes to the behavior here could result in unexpected double percent encoding for some folks who are already doing it the way in the sample script above.
I found Section 2.2 earlier, but didn't see where those declarations were used. Good find.
So, this means we should stay away from unnecessarily parsing gen-delims
in the userinfo
section. At the very least, .*
is way too broad.
Now the only one issue still here.... when password contain '@' unescaped symbol before '/#?' symbols, user info part matches incorrectly.
Again, a delimiter has to exist unescaped. There's no such thing as a parser algorithm that can take in every ASCII or Unicode unescaped character as a sub-field of the whole string. At some point, we have to force the user to escape their data before it gets dropped into a URL.
Fix for #143