servo / rust-url

URL parser for Rust
https://docs.rs/url/
Apache License 2.0
1.31k stars 325 forks source link

`=` is not being escaped as query value #907

Closed da-kami closed 7 months ago

da-kami commented 7 months ago

This is related to https://github.com/servo/rust-url/issues/688 but I could pin the problem to = in the query string.

I would have expected that the parsed URL would be escaped, but it is not:

#[test]
    fn query_string_encode() {
        let url = Url::parse("https://www.example.com?q=asdf=").unwrap();
        let query = url.query().unwrap().to_string();
        assert_eq!(query, "q=asdf%3D");
    }
assertion `left == right` failed
  left: "q=asdf="
 right: "q=asdf%3D"

This is a problem specifically for = in a query string - other characters are escaped correctly.

e.g. this test passes:

    #[test]
    fn query_string_encode() {
        let url = Url::parse("https://www.example.com?q=asdf中文字").unwrap();
        let query = url.query().unwrap().to_string();
        assert_eq!(query, "q=asdf%E4%B8%AD%E6%96%87%E5%AD%97");
    }
valenting commented 7 months ago

This is related to #688 but I could pin the problem to = in the query string. I would have expected that the parsed URL would be escaped, but it is not: let url = Url::parse("https://www.example.com?q=asdf=").unwrap();

There are two = characters in your input. Why would one be escaped while the other one isn't? The URL query percent encode set does not contain the = character, so it's not escaped. This is consistent with the reference URL parser and browser implementations.

da-kami commented 7 months ago

I would have expected it to be escaped because it is part of the value, not the key part of this query parameter. The delimiter for multiple query params is typically &, and I would have attributed anything between the first = and the next delimiter (be it &, / or the end of the string) to the value.

But as you pointed out the accepted convention doesn't escape = in this context. It appears that squid proxy does escape it, so I assumed it would make sense give that it is a common proxy out there. I stand corrected :) Thanks for adding the references!