servo / rust-url

URL parser for Rust
https://docs.rs/url/
Apache License 2.0
1.27k stars 317 forks source link

[DataUrl] Unable to parse application/json;utf8 containing # #908

Closed bh2smith closed 4 months ago

bh2smith commented 4 months ago

Describe the bug

Here is a test case demonstrating that valid JSON documents containing # char are not parsed correctly:

    #[test]
    fn tiny_example() {
        fn parse(data: &str) -> anyhow::Result<Value> {
            let data_url = DataUrl::process(data)?;
            if let Some(frag) = fragment {
                println!("Fragment {}", frag.to_percent_encoded());
            }
            let (body, _fragment) = data_url.decode_to_vec()?;
            let data: Value = serde_json::from_slice(&body)?;
            Ok(data)
        }

        let data = r#"data:application/json;utf8,{"name":"Good number 1"}"#;
        assert!(parse(data).is_ok());
        let data = r#"data:application/json;utf8,{"name":"Bad #1"}"#;
        assert_eq!(
            parse(data).unwrap_err().to_string(),
            "EOF while parsing a string at line 1 column 13"
        );
    }

What appears to be happening is the # character is being interpreted as the beginning of a fragment (The print statement in the code demonstrates this suspicion).

valenting commented 4 months ago

This is the correct way to parse the URL See the Reference URL parser

The parser doesn't understand what json is, so data:application/json;utf8,{"name":"Bad #1"} or data:application/json;utf8,{"name":"Bad"}#hash are parsed in the same way.

If you want to include json containing a hash, I would recommend either escaping the hash: data:application/json;utf8,{"name":"Bad %231"} or better yet use base64 data URL data:application/json;base64,eyJuYW1lIjoiQmFkICMxIn0