servo / rust-url

URL parser for Rust
https://docs.rs/url/
Apache License 2.0
1.31k stars 325 forks source link

Fix anarchist URL where path starts with // #817

Closed qsantos closed 1 year ago

qsantos commented 1 year ago

This PR closes https://github.com/servo/rust-url/issues/799 by refusing to parse URLs with no authority whose normalized path would start with a double-slash.


The issue comes from the remove dot segments step. Let's consider the URI m:/.//. Then, according to the RFC:

B. if the input buffer begins with a prefix of "/./" […], then replace that prefix with "/"

So it should be normalized to m://, but this has different semantics (resulting in \ being interpreted as being part of the authority in the original example).

I have conducted a few tests with some URI normalization libraries:

$ node test.js
http://example.com/a/c/d
m:/.//
$ cat test.php
<?php
require_once 'URLNormalizer.php';
echo (new Normalizer('http://example.com/a/b/../c/d'))->normalize(), "\n";
echo (new Normalizer('m:/.//'))->normalize(), "\n";
?>
$ php test.php
http://example.com/a/c/d
m:/
$ cat test.pl
use feature qw(say);
use URI::Normalize qw(normalize_uri);
say normalize_uri(URI->new('http://example.com/a/b/../c/d'));
say normalize_uri(URI->new('m:/.//'));
$ perl test.pl
http://example.com/a/c/d
m://
$ cat test.rb
require 'addressable/uri'
p Addressable::URI.parse('http://example.com/a/b/../c/d').normalize.to_s
p Addressable::URI.parse('m:/.//').normalize.to_s
$ ruby test.rb
"http://example.com/a/c/d"
/usr/share/rubygems-integration/all/gems/addressable-2.8.1/lib/addressable/uri.rb:2487:in `validate': Cannot have a path with two leading slashes without an authority set: 'm://' (Addressable::URI::InvalidURIError)
    from /usr/share/rubygems-integration/all/gems/addressable-2.8.1/lib/addressable/uri.rb:2410:in `defer_validation'
    from /usr/share/rubygems-integration/all/gems/addressable-2.8.1/lib/addressable/uri.rb:839:in `initialize'
    from /usr/share/rubygems-integration/all/gems/addressable-2.8.1/lib/addressable/uri.rb:2184:in `new'
    from /usr/share/rubygems-integration/all/gems/addressable-2.8.1/lib/addressable/uri.rb:2184:in `normalize'
    from test.rb:3:in `<main>'

For PHP, I am using https://github.com/glenscott/url-normalizer. In short:

I feel like Ruby's is the most consistent and straightforward solution.

This does mean that the no_panic test from https://github.com/servo/rust-url/issues/654 must be amended. However, we can cover the m:/.// case in a dedicated unit test.

valenting commented 1 year ago

You reference an RFC. This crate explicitly only implements the URL Standard. According to the reference URL parser m:/.//\\ should parse successfully

qsantos commented 1 year ago

My bad. Thanks for the proper reference. Then, it looks like this comes from the serialization part, more specifically, from section 4.5, item 3, whose note explicitly mentions the case we are discussing. I have correspondingly moved the correction to the serialization.

qsantos commented 1 year ago

Rebased to take into account the new tests

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 83.87% and project coverage change: -0.02 :warning:

Comparison is base (edeaea7) 82.74% compared to head (fd29f5f) 82.73%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #817 +/- ## ========================================== - Coverage 82.74% 82.73% -0.02% ========================================== Files 20 20 Lines 3303 3330 +27 ========================================== + Hits 2733 2755 +22 - Misses 570 575 +5 ``` | [Impacted Files](https://codecov.io/gh/servo/rust-url/pull/817?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=servo) | Coverage Δ | | |---|---|---| | [url/src/slicing.rs](https://codecov.io/gh/servo/rust-url/pull/817?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=servo#diff-dXJsL3NyYy9zbGljaW5nLnJz) | `91.93% <80.00%> (-1.17%)` | :arrow_down: | | [url/src/parser.rs](https://codecov.io/gh/servo/rust-url/pull/817?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=servo#diff-dXJsL3NyYy9wYXJzZXIucnM=) | `81.38% <80.95%> (+0.06%)` | :arrow_up: | | [url/src/lib.rs](https://codecov.io/gh/servo/rust-url/pull/817?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=servo#diff-dXJsL3NyYy9saWIucnM=) | `76.47% <100.00%> (+0.12%)` | :arrow_up: | | [data-url/tests/wpt.rs](https://codecov.io/gh/servo/rust-url/pull/817?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=servo#diff-ZGF0YS11cmwvdGVzdHMvd3B0LnJz) | `84.41% <0.00%> (-1.30%)` | :arrow_down: | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=servo). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=servo)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

qsantos commented 1 year ago

I have moved the handling of empty fragment to with_query_and_fragment. That way, it should be able to normalize URLs whether they are a fresh parse, or a combination of an existing URL and a relative path.

qsantos commented 1 year ago

Thanks for the reactivity and the detailed comments!