hyperium / http

Rust HTTP types
Apache License 2.0
1.16k stars 291 forks source link

URI too long verification #335

Open Pzixel opened 5 years ago

Pzixel commented 5 years ago

I have a question about this check

https://github.com/hyperium/http/blob/78058fdd94f209304660f17c067bf30f253df298/src/uri/mod.rs#L265-L270

The problem I have is I have to request a server which only accept data via query string. Eventually I run out of u16::MAX_VALUE so I'm unable to send request. I have to workaround it in ugly ways (e.g. via curl), and I don't actually see any reason why validate URI length of client side? Because

  1. I didn't find anything in standards about limiting max length of requests
  2. Even if there is such a limit, in real life there are servers that may process unlimited requests.

May be this limit to be removed or relaxed? My current URL length is around 10MB.


In my case server may accept requests of arbitrary length (and it's the only way to communicate with it). Rust is intended to be a system language which allows you to perform any task (possibliy with some unsafes), but in this case it couldn't be done. I agree that this is probably a good check in default case, but there is no "workaround" if I really know what I'm doing and I need to extend this limit.

carllerche commented 5 years ago

In practice, virtually all HTTP servers that I know of limit the max size of the HTTP headers to an amount smaller than u16::MAX_VALUE. Which HTTP server are you using that supports a 10MB URL?

Pzixel commented 5 years ago

I think there are many implementations which won't limit maximum request size. In my case I'm requesting https://github.com/Project-OSRM/osrm-backend


Although it makes sense to ask them to allow passing big data via HTTP body it also makes sense to have some way to request bad/broken servers in a system language library that the entire Rust ecosystem depends on.

Pzixel commented 5 years ago

So, the problem in a nutshell:

  1. Current implementation limit requests on client side with u16::MAX, although some servers are fine with such requests
  2. All rust requests libraries (such as hyper/reqwest/...) are based on http so I can't just take another which doesn't perform this check. So it's merely impossible to query such a server in Rust!
  3. There is no reason for such a limitation. If server couldn't handle the request the person it will reject the request when handling it, maybe a bit later, but it will. If it could then there is no reason for such a limit.

It looks like a premature limit so I suggest to relax or remove it at all.

seanmonstar commented 5 years ago

Many many servers would not allow a 10MB request-target normally (it's preferable to send that data in the body, and use Expect: 100-continue to prevent waste bandwidth).

In this case, I believe the size was used for optimizations in the structure layout. I can look in more detail in a couple days.

Pzixel commented 5 years ago

Many many servers would not allow a 10MB request-target normally (it's preferable to send that data in the body, and use Expect: 100-continue to prevent waste bandwidth).

I agree that it's not a mean case, but it happens sometimes. Another example is OData - if you happen to have an SQL query like WHERE ID in ( %lots_of_ids_here% ) it will serialize it in query string so eventually you may find yourself out of u16::MAX. And all servers I know have this patameter configurable. These edge cases have to be addressed somehow, even if they happen rarely.

In this case, I believe the size was used for optimizations in the structure layout. I can look in more detail in a couple days.

Looking torward it 👍

carllerche commented 5 years ago

The size limit was placed to reserve the ability to shrink the Uri struct, which is quite large.

AFAIK, the only way to support larger URLs w/o forcing the struct size to be bigger for all would be a feature flag.

Pzixel commented 5 years ago

The size limit was placed to reserve the ability to shrink the Uri struct, which is quite large.

Are we talking about 2 extra bytes per Uri instance (which gets wasted with alignment anyway unless we have arrays of them) or am I missing anything?

carllerche commented 5 years ago

There isn't only 1 u16 per Uri, there are many. Each offset, length, etc.. for each component needs to be tracked.

Pzixel commented 5 years ago

In this case I think u16 is fine for storing anything but query string since in most real cases it's the main source of big URIs.

So it's possible to keep everything but query string length in u16.

Pzixel commented 5 years ago

So, any progress on that?

Pzixel commented 4 years ago

Hello. Currently we patched this for ourselves, but we'd like to have some feature gate merged in upstream to remove need in patching deps of our deps.

It's a critical feature for us, and it's quite awkward to maintain own fork. Having something like feature-large-uri would be great.


I can try implementing it myself if you will merge it.

Pzixel commented 2 years ago

Ping?.. Could we make it a feature gate like len-u32 which is disabled by default? We'd really like to use the upstream instead of our own fork. But we just can't live with u16 limit - there are too many servers that HAVE to be communicated via long URIs.

I can create a PR with impl and tests if you're agreed that we need this

Pzixel commented 1 year ago

Hello again. I've just checked HTTP spec: https://datatracker.ietf.org/doc/html/rfc2068

It says:

For definitive information on URL syntax and semantics, see RFC 1738 [4] and RFC 1808 [11]. The BNF above includes national characters not allowed in valid URLs as specified by RFC 1738, since HTTP servers are not restricted in the set of unreserved characters allowed to represent the rel_path part of addresses, and HTTP proxies may receive requests for URIs not defined by RFC 1738.

The HTTP protocol does not place any a priori limit on the length of

a URI. Servers MUST be able to handle the URI of any resource they

serve, and SHOULD be able to handle URIs of unbounded length if they

provide GET-based forms that could generate such URIs.

A server SHOULD return 414 (Request-URI Too Long) status if a URI is longer than the server can handle (see section 10.4.15).

Therefore fact that some large urls aren't supported looks like violation of this part of the spec. While I understand performance-related reasons to limit URLs we need to at least have a feature to have spec-compatible URLs.

cc @carllerche