Open Pzixel opened 5 years ago
In practice, virtually all HTTP servers that I know of limit the max size of the HTTP headers to an amount smaller than u16::MAX_VALUE
. Which HTTP server are you using that supports a 10MB URL?
I think there are many implementations which won't limit maximum request size. In my case I'm requesting https://github.com/Project-OSRM/osrm-backend
Although it makes sense to ask them to allow passing big data via HTTP body it also makes sense to have some way to request bad/broken servers in a system language library that the entire Rust ecosystem depends on.
So, the problem in a nutshell:
u16::MAX
, although some servers are fine with such requestsIt looks like a premature limit so I suggest to relax or remove it at all.
Many many servers would not allow a 10MB request-target normally (it's preferable to send that data in the body, and use Expect: 100-continue
to prevent waste bandwidth).
In this case, I believe the size was used for optimizations in the structure layout. I can look in more detail in a couple days.
Many many servers would not allow a 10MB request-target normally (it's preferable to send that data in the body, and use Expect: 100-continue to prevent waste bandwidth).
I agree that it's not a mean case, but it happens sometimes. Another example is OData - if you happen to have an SQL query like WHERE ID in ( %lots_of_ids_here% )
it will serialize it in query string so eventually you may find yourself out of u16::MAX
. And all servers I know have this patameter configurable. These edge cases have to be addressed somehow, even if they happen rarely.
In this case, I believe the size was used for optimizations in the structure layout. I can look in more detail in a couple days.
Looking torward it 👍
The size limit was placed to reserve the ability to shrink the Uri
struct, which is quite large.
AFAIK, the only way to support larger URLs w/o forcing the struct size to be bigger for all would be a feature flag.
The size limit was placed to reserve the ability to shrink the Uri struct, which is quite large.
Are we talking about 2 extra bytes per Uri instance (which gets wasted with alignment anyway unless we have arrays of them) or am I missing anything?
There isn't only 1 u16
per Uri
, there are many. Each offset, length, etc.. for each component needs to be tracked.
In this case I think u16 is fine for storing anything but query string since in most real cases it's the main source of big URIs.
So it's possible to keep everything but query string length in u16
.
So, any progress on that?
Hello. Currently we patched this for ourselves, but we'd like to have some feature gate merged in upstream to remove need in patching deps of our deps.
It's a critical feature for us, and it's quite awkward to maintain own fork. Having something like feature-large-uri
would be great.
I can try implementing it myself if you will merge it.
Ping?.. Could we make it a feature gate like len-u32
which is disabled by default? We'd really like to use the upstream instead of our own fork. But we just can't live with u16 limit - there are too many servers that HAVE to be communicated via long URIs.
I can create a PR with impl and tests if you're agreed that we need this
Hello again. I've just checked HTTP spec: https://datatracker.ietf.org/doc/html/rfc2068
It says:
For definitive information on URL syntax and semantics, see RFC 1738 [4] and RFC 1808 [11]. The BNF above includes national characters not allowed in valid URLs as specified by RFC 1738, since HTTP servers are not restricted in the set of unreserved characters allowed to represent the rel_path part of addresses, and HTTP proxies may receive requests for URIs not defined by RFC 1738.
The HTTP protocol does not place any a priori limit on the length of
a URI. Servers MUST be able to handle the URI of any resource they
serve, and SHOULD be able to handle URIs of unbounded length if they
provide GET-based forms that could generate such URIs.
A server SHOULD return 414 (Request-URI Too Long) status if a URI is longer than the server can handle (see section 10.4.15).
Therefore fact that some large urls aren't supported looks like violation of this part of the spec. While I understand performance-related reasons to limit URLs we need to at least have a feature to have spec-compatible URLs.
cc @carllerche
I have a question about this check
https://github.com/hyperium/http/blob/78058fdd94f209304660f17c067bf30f253df298/src/uri/mod.rs#L265-L270
The problem I have is I have to request a server which only accept data via query string. Eventually I run out of
u16::MAX_VALUE
so I'm unable to send request. I have to workaround it in ugly ways (e.g. viacurl
), and I don't actually see any reason why validate URI length of client side? BecauseMay be this limit to be removed or relaxed? My current URL length is around 10MB.
In my case server may accept requests of arbitrary length (and it's the only way to communicate with it). Rust is intended to be a system language which allows you to perform any task (possibliy with some
unsafe
s), but in this case it couldn't be done. I agree that this is probably a good check in default case, but there is no "workaround" if I really know what I'm doing and I need to extend this limit.