Closed mxaddict closed 4 months ago
Printing the host value it's returning "https
Which is not a valid host if i recall correctly from the URL standards.
According to the URL standard, it's a valid URL. https://jsdom.github.io/whatwg-url/#url=aHR0cDovLyJodHRwcy8vdGVzdC5jb20vc29tZS9wYXRo&base=YWJvdXQ6Ymxhbms=
" is not an invalid character.
Hmm, so the issue is with reqwest then, cause I'm passing URL to reqwest after parsing with url::Url::parse(), and reqwest is complaining about invalidUriChar
Moving the bug report to reqwest then, thanks for the clarification
Hmm, after doing some more digging, seems that quotes are not a valid hostname character: Check RFC 3986.
I also tested in chrome and it does not thing that hostname with quotes is a valid url, just paste the test url from my code and test it in chrome.
Well, it's a valid by the URL standard. But "
is not a valid DNS name character, so obviously the request will fail.
I'm assuming that's why it's failing, and not exactly while parsing the URL - which as you observed, does successfully parse.
Hmm, after doing some more digging, seems that quotes are not a valid hostname character: Check RFC 3986.
I also tested in chrome and it does not thing that hostname with quotes is a valid url, just paste the test url from my code and test it in chrome.
Firefox and Chrome don't accept the character either. But still the URL standard says it's valid and that's what this crate implements.
I see, so it's allowed as a URI but not allowed by the HTTP url standard?
Hmmm, not sure where the fix for this should be then.
I think this section of RFC 3986 might apply here:
host = IP-literal / IPv4address / reg-name
IP-literal = "[" ( IPv6address / IPvFuture ) "]"
IPv4address = dec-octet "." dec-octet "." dec-octet "." dec-octet
reg-name = *( unreserved / pct-encoded / sub-delims )
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded = "%" HEXDIG HEXDIG
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
Wouldn't this imply that hostname should be only apha-numeric with the allowed chars?
I don't see how "
would be valid in this case.
Anyway, thanks for the help, might have to add a second layer of validation on my code between rust-url parse and reqwest lib.
This seems like a niche situation 🤣
I just added this to my code after the URL parse step (which should work for my usecase)
if url_start.host_str().unwrap().contains("\"") {
update_url(&pool, url, &vec![], "", URL_ERR_HOST).await;
thread_error_count += 1;
thread_done_count += 1;
continue;
}
Thanks for the info, love this crate by the way, has helped me a ton! 👍🏼
Have this code:
Expected this to parse and say that the url is invalid.
Instead it's parsing and saying it's valid.
See attached screenshot: