async-http-client: Calling withHeaders after withEntity removes the just set entity

rintcius commented 2 years ago

      Request[IO](Method.POST, uri"https://httpbin.org/post")
        .withEntity("some body")
        .withHeaders(
          org.http4s.headers.`Content-Type`(MediaType.text.plain)
        )

See https://github.com/rintcius/http4s-clients-poc/blob/master/src/test/scala/com/github/rintcius/poc/http4s/client/PostReqsSuite.scala#L68-L80 for repro

rintcius commented 2 years ago

Related to http4s/http4s-okhttp-client#5 - okhttp has same issue, but blaze, ember and jetty don't lose the entity

rossabaker commented 2 years ago

There's a broader discussion to be had about this backend: upstream has been unmaintained since June.

rossabaker commented 2 years ago

withHeaders replaces all headers. putHeaders replaces headers of the same name. (This is a good use of "put" if we think of it like a Map, and a bad use of "put" if we think of it like HTTP.)

rintcius commented 2 years ago

I guess the broader question here is: is the goal of http4s to make switching from one client to another as frictionless as possible?

I.e. currently, if someone constructed a request like req.withEntity(..).withHeaders(..) using blaze/ember/jetty then things work as expected. However if they decide to switch to async-http-client or okhttp then things will break.

rossabaker commented 2 years ago

is the goal of http4s to make switching from one client to another as frictionless as possible

Yes, but "frictionless" is open to some interpretation as we get into undefined behavior.

When we call withHeaders after a withEntity of known size, we obliterate the Content-Length header. putHeaders preserves it. (The same happens for streaming bodies, with Transfer-Encoding in place of the Content-Length header).

scala> Request[IO]().withEntity("test").headers.foreach(println)
Content-Length: 4
Content-Type: text/plain; charset=UTF-8

scala> Request[IO]().withEntity("test").withHeaders("X-Foo" -> "bar").headers.foreach(println)
X-Foo: bar

scala> Request[IO]().withEntity("test").putHeaders("X-Foo" -> "bar").headers.foreach(println)
Content-Length: 4
Content-Type: text/plain; charset=UTF-8
X-Foo: bar

Everything here so far is core behavior, independent of backend. But there are two uncomfortable cases to consider:

A client SHOULD NOT generate content in a GET request unless it is made directly to an origin server that has previously indicated, in or out of band, that such a request has a purpose and will be adequately supported.

This applies to all three requests above. http4s doesn't implement any in-band protocol for this, nor can it know about an out-of-band protocol. If we always drop the bodies, the backends are most consistent, but http4s can no longer be used to communicate "purposefully" with such servers. If we try to send the body, we find that some of our backends crash.

We can and should increase portability by guaranteeing that all of these requests are sent successfully. If we go one step further and specify that they are sent without content, we get more portability, at the expense of trusting the user with a feature the spec permits.

A user agent SHOULD send Content-Length in a request when the method defines a meaning for enclosed content and it is not sending Transfer-Encoding. For example, a user agent normally sends Content-Length in a POST request even when the value is 0 (indicating empty content). A user agent SHOULD NOT send a Content-Length header field when the request message does not contain content and the method semantics do not anticipate such data.

That means that withEntity / withHeaders constructs a request it SHOULD not for methods that support content. When a method anticipates content, an http4s backend can fill in a Transfer-Encoding to cover for the user error. But should GET requests infer headers when it's dubious whether they should send content at all?

rintcius commented 2 years ago

https://github.com/http4s/http4s/pull/6117 is in agreement with the upper part of your comment, right? In particular, because now it doesn't take that step further anymore.

That means that withEntity / withHeaders constructs a request it SHOULD not for methods that support content. When a method anticipates content, an http4s backend can fill in a Transfer-Encoding to cover for the user error.

So you prefer this to be solved in the backends rather than core? Asking because conceptually I like the idea of entity and headers becoming more orthogonal. Because they are both separate fields it's not intuitive that there's actually a dependency between them.

If the headers would go in 2 buckets (inferred headers and explicitly set headers) then this dependency is at least reduced. withHeaders would still work as-is for the most part, it just leaves inferred headers alone as much as possible:

scala> Request[IO]().withEntity("test").withHeaders("X-Foo" -> "bar").withHeaders("Y-Foo" -> "baz").headers.foreach(println)
Content-Length: 4
Content-Type: text/plain; charset=UTF-8
Y-Foo: baz

In this way req.withEntity(e).withHeaders(hs), req.withHeaders(hs).withEntity(e) and Request[IO](req.method, req.uri, entity = e, headers = hs) would typically be the same (unless you explicitly set a header that was already inferred otherwise).

But should GET requests infer headers when it's dubious whether they should send content at all?

Could GET requests follow the same strategy as methods that do support content? I.e. these headers are only inferred if the entity is set. For GET it isn't set usually so it won't infer any headers, but if it is set they will be inferred just like e.g. a POST with contents.

http4s / http4s-async-http-client

async-http-client: Calling withHeaders after withEntity removes the just set entity #5