Open rintcius opened 2 years ago
Related to http4s/http4s-okhttp-client#5 - okhttp has same issue, but blaze, ember and jetty don't lose the entity
There's a broader discussion to be had about this backend: upstream has been unmaintained since June.
withHeaders
replaces all headers. putHeaders
replaces headers of the same name. (This is a good use of "put" if we think of it like a Map, and a bad use of "put" if we think of it like HTTP.)
I guess the broader question here is: is the goal of http4s to make switching from one client to another as frictionless as possible?
I.e. currently, if someone constructed a request like req.withEntity(..).withHeaders(..)
using blaze/ember/jetty then things work as expected. However if they decide to switch to async-http-client or okhttp then things will break.
is the goal of http4s to make switching from one client to another as frictionless as possible
Yes, but "frictionless" is open to some interpretation as we get into undefined behavior.
When we call withHeaders
after a withEntity
of known size, we obliterate the Content-Length
header. putHeaders
preserves it. (The same happens for streaming bodies, with Transfer-Encoding
in place of the Content-Length
header).
scala> Request[IO]().withEntity("test").headers.foreach(println)
Content-Length: 4
Content-Type: text/plain; charset=UTF-8
scala> Request[IO]().withEntity("test").withHeaders("X-Foo" -> "bar").headers.foreach(println)
X-Foo: bar
scala> Request[IO]().withEntity("test").putHeaders("X-Foo" -> "bar").headers.foreach(println)
Content-Length: 4
Content-Type: text/plain; charset=UTF-8
X-Foo: bar
Everything here so far is core behavior, independent of backend. But there are two uncomfortable cases to consider:
A client SHOULD NOT generate content in a GET request unless it is made directly to an origin server that has previously indicated, in or out of band, that such a request has a purpose and will be adequately supported.
This applies to all three requests above. http4s doesn't implement any in-band protocol for this, nor can it know about an out-of-band protocol. If we always drop the bodies, the backends are most consistent, but http4s can no longer be used to communicate "purposefully" with such servers. If we try to send the body, we find that some of our backends crash.
We can and should increase portability by guaranteeing that all of these requests are sent successfully. If we go one step further and specify that they are sent without content, we get more portability, at the expense of trusting the user with a feature the spec permits.
A user agent SHOULD send Content-Length in a request when the method defines a meaning for enclosed content and it is not sending Transfer-Encoding. For example, a user agent normally sends Content-Length in a POST request even when the value is 0 (indicating empty content). A user agent SHOULD NOT send a Content-Length header field when the request message does not contain content and the method semantics do not anticipate such data.
That means that withEntity / withHeaders
constructs a request it SHOULD not for methods that support content. When a method anticipates content, an http4s backend can fill in a Transfer-Encoding
to cover for the user error. But should GET requests infer headers when it's dubious whether they should send content at all?
https://github.com/http4s/http4s/pull/6117 is in agreement with the upper part of your comment, right? In particular, because now it doesn't take that step further anymore.
That means that withEntity / withHeaders constructs a request it SHOULD not for methods that support content. When a method anticipates content, an http4s backend can fill in a Transfer-Encoding to cover for the user error.
So you prefer this to be solved in the backends rather than core? Asking because conceptually I like the idea of entity and headers becoming more orthogonal. Because they are both separate fields it's not intuitive that there's actually a dependency between them.
If the headers would go in 2 buckets (inferred headers and explicitly set headers) then this dependency is at least reduced. withHeaders
would still work as-is for the most part, it just leaves inferred headers alone as much as possible:
scala> Request[IO]().withEntity("test").withHeaders("X-Foo" -> "bar").withHeaders("Y-Foo" -> "baz").headers.foreach(println)
Content-Length: 4
Content-Type: text/plain; charset=UTF-8
Y-Foo: baz
In this way req.withEntity(e).withHeaders(hs)
, req.withHeaders(hs).withEntity(e)
and Request[IO](req.method, req.uri, entity = e, headers = hs)
would typically be the same (unless you explicitly set a header that was already inferred otherwise).
But should GET requests infer headers when it's dubious whether they should send content at all?
Could GET requests follow the same strategy as methods that do support content? I.e. these headers are only inferred if the entity is set. For GET it isn't set usually so it won't infer any headers, but if it is set they will be inferred just like e.g. a POST with contents.
See https://github.com/rintcius/http4s-clients-poc/blob/master/src/test/scala/com/github/rintcius/poc/http4s/client/PostReqsSuite.scala#L68-L80 for repro