servo / rust-url

URL parser for Rust
https://docs.rs/url/
Apache License 2.0
1.31k stars 325 forks source link

Is it possible to force the %-encoding of `+`? #882

Open grahamc opened 10 months ago

grahamc commented 10 months ago

(but I'm hoping ...)

Describe the bug

This code sample, which is clearly standards-correct behavior emits https://example.com/foo+bar:

let mut url = Url::parse("https://example.com/").unwrap();
url.path_segments_mut().unwrap().push("foo+bar");

println!("{}", url);

However, some systems like AWS S3 incorrectly interpret +'s in the path to be spaces:

$ curl 'https://xxx.s3.amazonaws.com/v0.15.1+xxx/xxx'
<?xml version="1.0" encoding="UTF-8"?>
<Error>
  <Code>NoSuchKey</Code>
  <Message>The specified key does not exist.</Message>. 
  <Key>v0.15.1 xxx/xxx</Key>
  ...
</Error>

S3 does decode a %2B as a +, like I'm hoping for.

This puts me in a pickle about how to do this. If I url-encode the + before passing it to URL, it gets (again, properly) encoded to %252B, but S3 doesn't understand that obviously.

Is there a way to get the behavior I'm looking for? Would I have to get the path implement my own string replacement of + with %2B?

valenting commented 10 months ago

It seems the issue only happens when using path_segments_mut because of the code here. I think % was added to PATH_SEGMENT & SPECIAL_PATH_SEGMENT to prevent you from setting the path to %2e which is not allowed. let x = url::Url::parse("https://example.com/foo%2bbar"); works as expected.