hyperium / hyper

An HTTP library for Rust
https://hyper.rs
MIT License
14.08k stars 1.55k forks source link

`Uri` shouldn't accept curly braces (`{`, `}`) in path parts #3594

Open SergioBenitez opened 3 months ago

SergioBenitez commented 3 months ago

Please see https://github.com/rwf2/Rocket/issues/2733 for a run-down of the concerns. Note that none of Firefox, Chrome, nor Safari believe such a URI is valid, so this does not appear to be an exception made in practice. It is also not spec-conformant.

dswij commented 3 months ago

Seems like the curly braces were made valid after https://github.com/hyperium/http/pull/474 with the comment on why it was accepted.

I agree that most usage doesn't accept curly braces as valid characters. A good alternative would be to split this into two entities: a StrictUri (maybe a better name?) struct that does not accept curly braces and the Uri that's more flexible.

Sebbito commented 3 months ago

JSON embedded in the URI path seems like a crime.

Vagelis-Prokopiou commented 3 months ago

If it is not spec-compliant according to @SergioBenitez comment, why accept them? The fact that some clients do it, should not force a non spec-compliant behavior of the server imo.

seanmonstar commented 3 months ago

It does seem wrong, but it is also seen in real life by servers.

Real life unfortunately tends to differ from specs. Engineering is often trying to find the right balance of which to stick to.

SergioBenitez commented 3 months ago

Are there any more details on exactly which servers or clients we're concerned with? The motivation for the initial change feels unsubstantiated to me. I regard it as peculiar to make a change that impacts an entire ecosystem for the concerns of a single (?) party without substantial motivation or explanation, especially when that change may have security implications.

seanmonstar commented 3 months ago

For some testing, I sent a request with braces to several servers (h2o.examp1e.net, github.com, google.com, even httpbin.org/anything/{anything}), and they all either returned either a 404 or a 200.

especially when that change may have security implications.

I saw the mention of security concerns in the linked issue, but I would never recommend anyone pass untrusted data to a templating engine without escaping it.

It seems unfair to send a message to users implying hyper is insecure because of this.

SergioBenitez commented 3 months ago

I saw the mention of security concerns in the linked issue, but I would never recommend anyone pass untrusted data to a templating engine without escaping it.

It seems unfair to send a message to users implying hyper is insecure because of this.

The README for hyper says:

Tested and correct

There's even emphasis on "correct'. The docs for Uri say:

For HTTP 1, this is included as part of the request line. From Section 5.3, Request Target: Once an inbound connection is obtained, the client sends an HTTP request message (Section 3) with a request-target derived from the target URI.

And specifically, the parsing from_static() parsing method docs say:

Convert a Uri from a static string. This function will not perform any copying, however the string is checked to ensure that it is valid. [..] This function panics if the argument is an invalid URI.

Composed together, this says that Uri implements a "request target" parser from section 5.3 of the HTTP/1 spec, and that it will refuse to parse invalid URIs. As such, if a user believes what's written in the documentation, then they don't need to assume that the URI looks a certain way, they know the data must look a certain way. The only alternative would be to simply disregard hyper's claims of correctness and its own documentation. That seems like an unfair ask of users.

If hyper is going to deviate from specs that it claims to follow, then it should make this very clear and not allude to specifications without clarifying how it deviates.

SergioBenitez commented 3 months ago

For some testing, I sent a request with braces to several servers (h2o.examp1e.net, github.com, google.com, even httpbin.org/anything/{anything}), and they all either returned either a 404 or a 200.

I think it's fine if any particular server wants to accept technically invalid URIs. My concern is largely that if one uses hyper to implement a server, then the choice to accept or not accept technically invalid URIs is largely absent, and worse, they may believe that any URI accepted by hyper is a valid one, given the documentation.

In similar domains, such as HTML parsing, libraries leave this choice to the user for a variety of reasons. html5ever and lol_html both have a variety of flags to control this behavior, for example, and other crates like tl make it very clear that they do not follow the spec strictly. I think hyper following such a model would be ideal.

Vagelis-Prokopiou commented 3 months ago

@SergioBenitez constructed a solid argument imo. The optimal solution would be to be able to control Hyper's behavior regarding the Uri validation/construction. If not, any deviation from the spec should at the very least be clearly communicated to the end user.