servo / rust-url

URL parser for Rust
https://docs.rs/url/
Apache License 2.0
1.33k stars 330 forks source link

Expose more parser configuration #667

Open sdroege opened 3 years ago

sdroege commented 3 years ago

Currently the parser handles http/ftp/ws/wss/ftp/file in a special way without allowing to use the same behaviour for other URI schemes or to configure this in more detail.

For example https://github.com/servo/rust-url/blob/21aed6e18c129de019a4bf375f095231ef9daab3/url/src/parser.rs#L158-L180

or https://github.com/servo/rust-url/blob/21aed6e18c129de019a4bf375f095231ef9daab3/url/src/parser.rs#L1439

or https://github.com/servo/rust-url/blob/21aed6e18c129de019a4bf375f095231ef9daab3/url/src/parser.rs#L182-L189

.

Especially the HTTP behaviour to always have a / following the origin part (e.g. http://example.com?foo=bar is wrong but http://example.com/?foo=bar is correct) would be useful for many other schemes too.

sdroege commented 3 years ago

Especially the HTTP behaviour to always have a / following the origin part (e.g. http://example.com?foo=bar is wrong but http://example.com/?foo=bar is correct) would be useful for many other schemes too.

I guess that also affects the serializer, not just the parser.

valenting commented 3 years ago

That's because the URL Spec defines those schemes as special, an no others.

sdroege commented 3 years ago

Other specs on top of that or RFC3986 can and do define (e.g. RFC7826) similar rules though. It seems limiting if this is not customizeable here.

tmccombs commented 3 years ago

That's because the URL Spec defines those schemes as special, an no others.

Which makes sense in the context of the browser, but not as much in other contexts. (Although even in the browser, this functionality may be useful).

Maybe this should be filed as a bug/feature request against the WHATWG spec?

djc commented 3 years ago

I don't think the WHATWG is interested in implementing specs for non-browser purposes, and per my current understanding there aren't a lot of resources devoted to further development of the URL standard anyway.

For context, there is also very limited maintenance on this repository and something of a backlog wrt spec compliance, so it's not clear that it makes sense to prioritize configurability outside the design goal of tracking the URL standard. (Personally I do think it might make sense to enable configuration to change the set of special schemes, but it hasn't been very rewarding to me to do reviews and bug fixes for this crate when the only active maintainer is unable to spend much time on it.)

sdroege commented 3 years ago

Well, I would be happy to implement something for this issues but if it's unlikely to get reviewed I can continue working around it on my side too.

This crate being the defacto URL/URI crate in Rust it seems problematic that there's a lack of maintainers. It might be a good idea to ask for new people to get involved here via the usual channels, like TWIR.

tmccombs commented 3 years ago

Also problematic that the de facto rust crate, which is used for many non-browser applications is so closely tied to a browser-specific spec (and one which contradicts older RFCs at that).

sdroege commented 3 years ago

That's certainly true too.