servo / rust-url

URL parser for Rust
https://docs.rs/url/
Apache License 2.0
1.31k stars 325 forks source link

encode space to '%20' as per url standard #928

Open kaffarell opened 5 months ago

kaffarell commented 5 months ago

Previously the space character was exclusively encoded to '+'. This is wrong, as the URL Standard 0 specifies that the default is '%20'. Another function has been introduced as well, which replicates the old behavior and converts spaces to '+'. Notice that this breaks the default behavior and could lead to bugs.

Fixes: #927 Fixes: #888

kaffarell commented 5 months ago

@valenting let me know what you think!

Manishearth commented 5 months ago

I am wary about breaking upstream users, especially the browser users that use this crate (who will likely care about nitty gritty url spec behavior)

@valenting @mrobinson can you check whether the Gecko/Servo callsites are all ones that need spaceIsPlus by default?

We can invert which behavior is the default here to solve this if needed.

codecov[bot] commented 5 months ago

Codecov Report

Attention: Patch coverage is 50.00000% with 1 lines in your changes are missing coverage. Please review.

:exclamation: No coverage uploaded for pull request base (main@de947ab). Click here to learn what that means.

:exclamation: Current head c23893a differs from pull request most recent head 4e3f1b5. Consider uploading reports for the commit 4e3f1b5 to get more accurate results

Files Patch % Lines
form_urlencoded/src/lib.rs 50.00% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #928 +/- ## ======================================= Coverage ? 81.88% ======================================= Files ? 20 Lines ? 3550 Branches ? 0 ======================================= Hits ? 2907 Misses ? 643 Partials ? 0 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

valenting commented 5 months ago

I am wary about breaking upstream users, especially the browser users that use this crate (who will likely care about nitty gritty url spec behavior)

@valenting @mrobinson can you check whether the Gecko/Servo callsites are all ones that need spaceIsPlus by default?

We can invert which behavior is the default here to solve this if needed.

As far as I can tell servo doesn't use this method https://github.com/search?q=repo%3Aservo%2Fservo+form_urlencoded&type=code

Nor does Gecko https://searchfox.org/mozilla-central/search?q=byte_serialize&path=&case=false&regexp=false

This will be a breaking change anyway.

Manishearth commented 5 months ago

Since this crate is used a lot should we implement this in a way that is non breaking (with reversed defaults, strongly documented), and perhaps file an issue for a potential 3.0 where we start listing breaking changes?

I suspect 99% of users will not care about this anyway. I feel like doing a 3.0 just for this is a bit much.

kaffarell commented 5 months ago

I am wary about breaking upstream users, especially the browser users that use this crate (who will likely care about nitty gritty url spec behavior)

This is a breaking change, although a very small one. As you said it will probably only affect those who care about the url spec, which in turn, should know this case already!

Since this crate is used a lot should we implement this in a way that is non breaking (with reversed defaults, strongly documented), and perhaps file an issue for a potential 3.0 where we start listing breaking changes?

I have to disagree with the reversed defaults. It's better if we do a 3.0 release and keep it this way, otherwise we will not be 'url-spec compliant'. The url-spec clearly states that the default-behavior should be '%20' and not '+'.

IMO a 3.0 release (or even a minor release if we're feeling adventurous :) ) is the way to go!

gibfahn commented 3 months ago

It would be nice to have an opt-in way to do this at least (which wouldn't have to be a breaking change hopefully), that way users who need this encoding can still use this crate while plans for a breaking change are worked out.

If nothing else this will probably break a bunch of tests.