servo / rust-url

URL parser for Rust
https://docs.rs/url/
Apache License 2.0
1.29k stars 322 forks source link

Properly log validation errors related to legacy IPv4 notations #498

Open zackw opened 5 years ago

zackw commented 5 years ago

The current rev of the URL Standard specifies that the various legacy notations accepted by inet_aton for IPv4 addresses are validation errors, but rust-url silently accepts them (they're not even diagnosed via the syntax violation callback). I would like to have an option to reject these.

Examples of undesirable constructs:

# all are treated as equivalent to http://127.0.0.1/
http://127.0.1/
http://127.1/
http://0177.0.1/
http://0177.0.0.1/
http://0177.1/
http://0x7f.0.1/
http://0x7f.0.0.1/
http://0x7f.1/
http://2130706433/
http://0x7f000001/
http://017700000001/

# equivalent to http://127.0.0.8/ , not http://127.0.0.10/
http://0127.0.0.010/

See https://url.spec.whatwg.org/#concept-ipv4-parser (pay close attention to where it, and its subroutine https://url.spec.whatwg.org/#ipv4-number-parser , set the "validationErrorFlag") for gory details.

Previous discussion in #116.

dekellum commented 5 years ago

I wasn't aware of this legacy edge case (thanks!), but I agree with your motivation. Seems exploitable for misuse. For example servers might have good reason to want to reject provided URLs to localhost. Do common IPv4 resolvers actually accept and resolve these examples to local ports?

If the violation callback was extended for this case, you would have the option of promoting the violation to your own error or other form of rejection? That would be a light-weight and (I think, reasonably) backward-compatible way to achieve it, while still making the implementation available in this crate. I'd use that!

zackw commented 5 years ago

That would work for me except that in #462 it’s proposed to remove the syntax violation callback.

dekellum commented 5 years ago

Indeed! I've suggested keeping it. :)

SimonSapin commented 5 years ago

Aligning with the spec in terms of what inputs log a syntax violation sounds good.

zackw commented 5 years ago

@dekellum re

Do common IPv4 resolvers actually accept and resolve these examples to local ports?

The POSIX spec for getaddrinfo says that "address strings using Internet standard dot notation as specified in inet_addr are valid", and that does include all of the unusual syntaxes I listed.

dekellum commented 5 years ago

@SimonSapin : Thanks. I think anyone implementing would be concerned about #462 Is it a reasonable time to decide that either way?

nox commented 5 years ago

AFAIK if we want to do that we need a new variant for the syntax violation enum, so this will be a breaking change, so we want to do that for 2.0 I guess.

dekellum commented 5 years ago

Would discouraging exhaustive matching (in itself a breaking change) on SyntaxViolation, and ideally ParseError, be good for 2.0, making this non-breaking when available? I can't parse the tracking in rust-lang/rust#44109 or guess what that stable MSRV would be, so likely the old fashion manual way with an unused variant would be best for now.

SimonSapin commented 5 years ago

It looks like #[non_exhaustive] is not stable in any release so far.

dekellum commented 5 years ago

Okay, would you entertain a PR to add an unused variant and rustdoc to each of these enums?

dekellum commented 5 years ago

Also, having looked at these error types, they are in need of some other modernizations, like dropping description() impl and replacing with a Display impls. I could do that in same PR. Of interest?

nox commented 5 years ago

I think it would be ok adding a doc(hidden) variant and making it extra clear in the docs that this enum is never expected to be exhaustively matched.

nox commented 5 years ago

like dropping description() impl

Nah please leave that alone, no need to regress things for no reason and the implementation is trivial. You can definitely add a Display impl though.

dekellum commented 5 years ago

IIRC, currentstd::error::Error default impls (now deprecated) description via Display, but I get your point, thanks.

SimonSapin commented 5 years ago

description returns a &str that borrows &self (not for example a String) so it can’t be implemented based on Display (only the other way around is possible).

You can click on the [src] link in https://doc.rust-lang.org/std/error/trait.Error.html to quickly see the body of default methods.