servo / rust-url

URL parser for Rust
https://docs.rs/url/
Apache License 2.0
1.31k stars 326 forks source link

What happened to the `url!` macro? #652

Open Fishrock123 opened 3 years ago

Fishrock123 commented 3 years ago

Use Urls any amount in rust and You'll end up with a decent bit of Url::new("string literal").unwrap() - it would seem a macro of some sort would be better for that, and a search shows that this indeed was a thing: https://github.com/servo/rust-url/pull/137

I couldn't find any issues/prs for removing it, so, what happened?

I've also ended up writing static urls with e.g. lazy_static! a bunch, which is also not ideal, since they are not evaluated at either compile or startup.

tmccombs commented 3 years ago

How feasible would it be to make Url::parse a const fn?

SimonSapin commented 3 years ago

This morning I had no recollection that this ever existed. I did some archeology but it’s all public on github…


If the only thing you’re worried about is typing Url::new("…").unwrap() a lot, consider adding a fn url(s: &str) -> Url function to your own crate that does that. It doesn’t even need to be a macro.

If you want to do compile-time parsing, these days you’d use a procedural macro rather than a compiler plugin. But more importantly: is that really useful, compared to lazy_static or once_cell?

How feasible would it be to make Url::parse a const fn?

Not feasible in today’s Stable Rust. The first reason that comes to mind is that parsing allocates a String and returns it in a field of Url, but there are probably a bunch of other blockers.

tmccombs commented 3 years ago

But more importantly: is that really useful, compared to lazy_static or once_cell?

I think it would be useful to check that the url is valid at compile time rather than runtime. Although I suppose unit tests could catch that.

no constructor that doesn’t do parsing

Why does no such constructor (or builder pattern) exist?

SimonSapin commented 3 years ago

The set of fields and their types in the Url struct are private implementation details that I’d rather not stabilize.

Fishrock123 commented 3 years ago

If the only thing you’re worried about is typing Url::new("…").unwrap() a lot, consider adding a fn url(s: &str) -> Url function to your own crate that does that. It doesn’t even need to be a macro.

Ideally whatever it is is easy enough for newcomers to type but aside from that I don't care about what you actually have to type.

My issue here is when that .unwrap() could panic.

If you want to do compile-time parsing, these days you’d use a procedural macro rather than a compiler plugin. But more importantly: is that really useful, compared to lazy_static or once_cell?

Yes, because both of those are evaluated lazily, as their names imply (i.e. once_cell::sync::lazy). This means that it could panic at any arbitrary time if someone miss-spells anything in the URL, particularly if that branch is not used often.

For example, in an internal bot we have something like this this:

lazy_static! {
    //... more fields
    static ref SLACK_POST_MESSAGE: Url =
        Url::parse("https://slack.com/api/chat.postMessage").expect("Valid Slack Post Message URL");
}

This somewhat unfortunately is very easy to type - I say unfortunately because I suspect engineers that I will be bringing onto rust projects internally will not immediately grasp what is happening here and will assume it works similar to just a statically declared const in a scripting language, i.e. evaluated before main, which it is not.

I'm not sure what the best way on this would be but to my imagination whatever it is would produce something like Url::from_parts(Protocol::HTTP, domain_bytes, path_bytes, vec_query_field_bytes) "under the hood".

SimonSapin commented 3 years ago

The URL parser is rather liberal in what string inputs it accepts. Url::parse("htttps://slck.com/…") for example will return Ok, but probably won’t do what you want. Of all things that can possibly go wrong, the parser returning an error is not particularly interesting in my opinion.

Url::from_parts(Protocol::HTTP, domain_bytes, path_bytes, vec_query_field_bytes)

That is not how it works. A fair amount of normalization is done during parsing (as specified in https://url.spec.whatwg.org/) so you can’t just take arbitrary bytes (or rather Unicode strings) and put them it. And the internal memory representation is not separate strings but a single string (that matches the serialization) together with some integer indices.

RobbieMcKinstry commented 2 years ago

Just following up on this, I think this is a great candidate to make into a const fn like @tmccombs pointed out. For many SDKs and API client libraries, the library authors know exactly what URLs they plan to support. These URLs are usually represented as &strs and hardcoded.

Turning this into a const fn allows the user to validate the URL at compile-time.

For example, today I wrote the following code:

const AccountBalance: &'static str =  concat!("https://api.kraken.com/0", "/private/Balance");
const Assets: &'static str = concat!("https://api.kraken.com/0", "/public/Assets");
const SystemStatus: &'static str = concat!("https://api.kraken.com/0", "/public/SystemStatus");
const SystemTime: &'static str = concat!("https://api.kraken.com/0", "/public/Time");
const Ticker: &'static str = concat!("https://api.kraken.com/0", "/public/Ticker");

I'd like to take this one step further by parsing the URL at compile-time, that way I don't have to call a helper function every time I want to upgrade a &'static str to a Url.

Lucretiel commented 2 years ago

The first reason that comes to mind is that parsing allocates a String

This is easily solved by changing the payload to a Cow<'static, str>, which is probably worth doing anyway, since so many URLs are hardcoded into the program. As for the rest of it- even if there's plenty of post-parse things that can go wrong (where a URL is "valid" but can't be used to make an HTTP request, for instance), there's still value in learning as soon as possible (eg, at build time, as a compile error) that a hardcoded URL parse will fail.

I think the Url::from_parts is probably thinking along the right lines; but we'd instead need a #[doc(hidden)] version that takes pre-computed integer indices. These could then be computed by the macro during compilation (and fail with a build error if there's a problem) and fed into an infallible from_parts constructor.

SimonSapin commented 2 years ago

The String stored in Url is not necessarily the same that was originally given to parse, I’m not sure how effective Cow would be

Lucretiel commented 2 years ago

Sure, but it's often the same as that was originally given to parse.

ghost commented 2 years ago

This is annoying.

DenisGorbachev commented 3 months ago

The url! macro for compile-time validation is now available via url-macro crate.

Disclaimer: I'm the author.