softwaremill / sttp

The Scala HTTP client you always wanted!
https://sttp.softwaremill.com
Apache License 2.0
1.44k stars 301 forks source link

Sttp fails with EOFException when receiving empty gzip-encoded response #1802

Open baldram opened 1 year ago

baldram commented 1 year ago

This is a test that is designed to reproduce a bug in the Sttp HTTP client library. Specifically, the bug occurs when a simple API call is made that responds with a Content-Encoding of gzip, and an empty response in combination with Created 201, and Accepted 202 codes.

When this occurs, the Sttp library fails with a java.io.EOFException (GZIPInputStream.java:268). The exception is not observed in combination with NoContent 204 code.

To reproduce the error and see the bug in action, run the following command:

$ scala-cli test https://gist.github.com/baldram/cfcb9203baaf968107f1d3da44ae1150

The test demonstrates the issue using various backends. Both the client3 (<= 3.8.14) and the new client4 (4.0.0-M1) are affected. The problem is reproducible in JVM 11+ and also in JVM 8 when using the simple TryHttpURLConnectionBackend.

The exception occurs when trying to read the response from the server, and can be triggered by any backend. The exception is not observed when using the ZIO 1 backend.

When commenting the line .withHeader("Content-Encoding", "gzip") in WireMock stub, all tests are green.

With the gzip header, the error responses are as follows:

For simple client:

Caused by: java.io.EOFException
    at java.base/java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:268)
    at java.base/java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:258)
    at java.base/java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:164)
    at java.base/java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:79)
    at java.base/java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:91)
    at sttp.client3.HttpClientSyncBackend.standardEncoding$$anonfun$1(HttpClientSyncBackend.scala:65)

For Cats:

Caused by: java.io.EOFException
    at java.base/java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:268)
    at java.base/java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:258)
    at java.base/java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:164)
    at java.base/java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:79)
    at java.base/java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:91)
    at sttp.client3.httpclient.cats.HttpClientCatsBackend.standardEncoding$$anonfun$1(HttpClientCatsBackend.scala:64)

The test snipped is available here: https://gist.github.com/baldram/cfcb9203baaf968107f1d3da44ae1150.

Are there any potential solutions to mitigate the problem? For example, I tried using quickRequest.post(apiUrl).response(IgnoreResponse).send(backend) but was unsuccessful.

If I receive any suggestions, I may attempt to work on a merge request to fix the issue, although my first attempt was unsuccessful.

Now, I consider using PushbackInputStream to read the first byte and determine if the stream is empty. It could be an effective solution if this approach works well with the HttpResponseInputStream currently in use. Please let me know if you find this suggestion suitable or if you have any alternative ideas in mind.

Next, the HttpClientBackend features a readResponse method, which we could potentially extend. This approach resembles existing checks, like the one for the HEAD method, that actively prevent decoding the response body.

PS: Similar is not an issue in case of JAX-RS/Jersey client.

adamw commented 1 year ago

Maybe we should check if the Content-Length header is not 0 before attempting to decode the body here: https://github.com/softwaremill/sttp/blob/e7ed7720a3ae4389c4b755b476e94bafb26ba694/core/src/main/scalajvm/sttp/client4/httpclient/HttpClientBackend.scala#L91 ? (and similarly in other backends)

baldram commented 1 year ago

check if the Content-Length header is not 0 before attempting to decode the body

I have yet to mention the absence of the Content-Length header in this scenario, so we cannot perform the said check. (The attached test simulates this situation).

Additionally, the lack of this header is not a reason to skip decoding the body. For example, the non-empty gzip response without a Content-Length header is processed successfully. That's why I suggest physically checking the content non-emptiness by attempting to read the first byte. Do you have any other suggestions or ideas for handling such situations?

adamw commented 1 year ago

That's some pathological response (without content-length) ;)

The problem here would be how you put the byte back. It's not possible neither in input streams, or reactive streams. So you'd have to somehow pass it further down explicitly?

adamw commented 1 year ago

But maybe a simpler solution would be replacing GZipInputStream with a custom implementation which correctly handles empty bodies, and delegates to the java one otherwise

baldram commented 1 year ago

That's some pathological response (without content-length) ;)

Yes, I agree :)

The problem here would be how you put the byte back.

As mentioned at the end of the ticket description, I am considering using the PushbackInputStream. It offers the ability to "unread" one byte.

a custom implementation which correctly handles empty bodies

If the Java standard is insufficient, that would be the next step.

adamw commented 1 year ago

Yeah, but wouldn't a thin wrapper on GZipInputStream solve the issue, without the need to peek at the first input byte?

baldram commented 1 year ago

Yes, I think we can try using such a thin-wrapper. The problem occurs deeper within, but if we could do only the necessary minimum and successfully delegate further processing to Java appropriately, it might work. I would try.