effective request uri - Githubissues

pyfisch commented 8 years ago

I've testet rotor-http a bit and the overall design is really good however I do not like the handling of request URIs.

Currently rotor parses the the request-target from the request line into a RequestUri enum provided by hyper. The request uri usually contains the absolute path as a string. But this string is often hard to use since it must be percent-decoded and separated from the query part manually. I propose to provide a Head.request_url() -> Url method that provides the effective request uri as described in RFC7230. The raw request target should be still provided but for almost anything consumers of rotor-http could use a standard rust-url url providing all the information stored in the the Host header field and the request-target string.

If this change it wanted I will provide a PR.

tailhook commented 8 years ago

Well, I have few questions about effective request uri:

What is default name for "authority" should be used? Should it be optional? Can it be different for different requests?
What if request uri is a full url, and there is a Host header which is different?
We can't respond the request solely based on the effective request URI because the path may be * which is kinda ignored in effective request uri, right?

I think fixing (2) and (3) needs a more complex validation of the request, which is not currently in rotor-http anyway. The library tries to be as small as possible, the only things that we validate are about security (i.e. parsing the request body well so that simple attacks for cache poisoning are impossible).

I believe, that there will be more layers on top of rotor http, for example: to make simple JSON API responder I'd prefer to always return buffered requests and have a fixed request timeout. This allows making a simplified handler protocol having only two or three methods instead of current 8 ones. And this layer might have request uri normalization (or might not have if it doesn't care about hostname).

What I believe would be beneficial for now is adding a docstring which describes how to construct effective request uri from the Head structure.

pyfisch commented 8 years ago

All HTTP/1.1 must contain a Host header field, so every request constains an "authority" (except when there is an empty Host header provided. In this case you can handle the request like HTTP/1.0). For HTTP/1.0 and HTTP/0.9 requests I used the "0.0.0.0" IP-adress as a fallback if no authority was provided in kinglet. The RFC allows the server to define a fixed authority but this is not necessarry, you can simply handle all hosts in the same way if you want to ignore authority. A HTTP server may handle requests for many different authorities, the Host header field is used to distinguish between them.
Proxies must ignore the Host header if the request target is a full URI. Servers usually do not receive full URIs but I would say they should do the same.
If the effective request uri as defined in RFC7230 is a string. The URI http://example.org has no path (*) , the URI http://example.org/ retrieves the document under /. Rust-url returns the same data for both cases but as described in servo/rust-url#134 one can use the empty path vector to describe a server wide request. The path / is represented as vec!["".to_owned()].

If rotor-http should not construct the effective request URI I propose not to parse request-target so consumers have all control about how to use it. The RequestUri already loses information about server-wide options requests at proxies. http://www.example.org:8001 and http://www.example.org:8001/ are the same for hyper. Not parsing the request-target also makes it easier in other cases to construct the correct URIs.

A more general question: Hyper contains a lot of code and depends on many other crates, but rotor-http only uses a small part of it. Most abstractions like parsed headers can also be implemented in higher-level crates. Is it planned to remove the dependency on hyper, or should it remain part of rotor-http?

tailhook commented 8 years ago

If rotor-http should not construct the effective request URI I propose not to parse request-target so consumers have all control about how to use it.

Yes, I like this more. Especially because we can return a slice of a buffer instead of allocating strings. (Rotor is currently structured in a way that allows Head to borrow from the buffer easily).

A more general question: Hyper contains a lot of code and depends on many other crates, but rotor-http only uses a small part of it. Most abstractions like parsed headers can also be implemented in higher-level crates. Is it planned to remove the dependency on hyper, or should it remain part of rotor-http?

I'm not sure yet. I'm in favor of the type-safe abstractions, but the downsides are:

Large number of dependencies which are useless for rotor-http
Excessive memory allocations (we can borrow slices in buffer instead of allocating strings for many things)
For simple proxy kind of use cases (including forwarding using a different protocol like fastcgi), we deserialize then serialize headers again. Which is mostly useless.

The upsides are:

We don't have to implement our own parser for Content-Length and Transfer-Encoding both for requests and responses (and we don't scan the header list twice)
For responses we also sure that header is valid (i.e. doesn't contain something like '\r\n' inside)

While RequestUri and Version are easy to reimplement. Method and StatusCode require up to date "database" of things. The Header thing is even harder to keep up to date. These things I'd like to defer to another authority.

I probably have to mention that I've proposed to split type-safe part from other stuff in hyper multiple times (hyperium/hyper#395)

pyfisch commented 8 years ago

Method can be a plain string, since the method is case-sensitive you can matching for specific methods is simple. The status code can be stored as u16. Higher level libs will probably want to use better abstractions but these do not need to be provided by rotor-http.

As you say headers are a bit more complicated but the headers rotor-http needs to understand are very few: Content-Length, Expect, Transfer-Encoding. The can be parsed without a larger library.

rotor-http will have to allow users to set raw headers because there are many exotic headers and not everyone will want to write a typed header handler for simple headers with an eventually static value. And hyper does not really check if headers contain \r\n. It relies upon the individual header fields to handle that. This means if you find a single typed header that does not check for \r\n you can insert arbitrary headers. For rotor-http it would be easy to check if a header contains illegal bytes.

Splitting hyper probably won't happen. You are not the first one to propose this (I myself also proposed it). Hyper should be THE http lib so from the perspective of the hyper owners there is no need for a separate type-safe crate. See hyperium/hyper#189 and hyperium/hyper#317.

tailhook commented 8 years ago

Okay, all of this makes sense. But let me finish prototyping a client implementation before we start refactoring these things.

tailhook commented 8 years ago

Okay, here is relevant code in client: https://github.com/tailhook/rotor-http/blob/master/src/client/parser.rs#L58 https://github.com/tailhook/rotor-http/blob/master/src/headers.rs The scan_headers looks like overengineered, and perhaps can be simplified. Otherwise, I believe we can get rid of hyper. Just need to be careful, doing performance benchmarks at each step.

tailhook commented 8 years ago

Well, the issue is implemented AFAICS. Further discussion of getting rid of hyper is in #15

tailhook / rotor-http

effective request uri #8