ninenines / cowboy

Small, fast, modern HTTP server for Erlang/OTP.
https://ninenines.eu
ISC License
7.31k stars 1.17k forks source link

Connection close issue after upgrading from 2.10 to 2.12 #1654

Open dergraf opened 2 months ago

dergraf commented 2 months ago

After upgrading Cowboy 2.10 to 2.12 we've noticed a different behaviour when downloading a large file using http1.1 over a rather bad internet connection. It's a Java client producing a Exception in thread "main" org.apache.hc.core5.http.ConnectionClosedException: Premature end of Content-Length delimited message body (expected: 119354644; received: 23919978) error. So my guess is the changes related to the timeouts in 2.12 trigger this issue. idle_timeout is unchanged at 5 minutes , so I tried to set the new reset_idle_timeout_on_send which didn't help. Maybe anyone has an idea what could cause the changed behaviour here. Thanks!

essen commented 2 months ago

Could also be 1a175e7b563110c4a47c41f2b9c78cdafbcaf063

* HTTP/1.1 connections would sometimes use the wrong timeout
  value to determine whether the connection should be closed.
  This resulted in connections staying up longer than
  intended. This should no longer be the case.

A git bisect would tell us much. There aren't that many commits and half of them are documentation so it should be straightforward.

dergraf commented 2 months ago

You were spot on with https://github.com/ninenines/cowboy/commit/1a175e7b563110c4a47c41f2b9c78cdafbcaf063 . I've dropped this commit from your main branch and deployed a 2.12', the download worked flawlessly. I'm going to try tomorrow over a even slower internet connection to double check that this was the issue.

dergraf commented 2 months ago

Double checked on the slowest connection I could find. Took me 3 minutes to download the 120MB, the version without the change worked like a charm. ;)

Should I create a PR reverting the change?

essen commented 2 months ago

No we should find where it goes wrong and fix it. It's probably using request_timeout where it should be idle_timeout. It was the opposite before (using idle_timeout when it should have been using request_timeout). Both behaviors are wrong but the commit should be closer to what we want (and it has tests).

My guess is that it doesn't check something properly and thinks there are no ongoing requests, but only debugging would tell us what's really happening.

A good first step would be writing a test that sends a large enough body that fails in the same way as you're seeing it.

riccardomanfrin commented 2 months ago

Hi guys, the problem is with this commit 1a175e7b as it was mentioned It changes the behavior for connections that were long standing in http 1.1

Originally I started cowboy with

:cowboy.start_tls(__MODULE__, cowboy_opts, %{
        idle_timeout: :infinity,
        inactivity_timeout: :infinity,
        env: %{dispatch: compiled_routes}
      })

To revert to the original behavior you have to add the option request_timeout: :infinity, like so:

:cowboy.start_tls(__MODULE__, cowboy_opts, %{
        request_timeout: :infinity,
        idle_timeout: :infinity,
        inactivity_timeout: :infinity,
        env: %{dispatch: compiled_routes}
      })

To be fair it's confusing to me the meaning of these 3 timeouts.. and at least in my mind request_timeout makes me think of the timeout while a request is being processed, not the timeout of the connection itself.

Unfortunately in my case I'm trying to prolong the lifetime of the connection even though there were no ongoing connections, which you might say to be an abuse of the protocol.. but it fits my use case..

I might switch to http2 if you think that the eventual fix is going to not allow me to do the trick in the future.

essen commented 2 months ago

The added option should do what you want, including when the issue in this ticket is fixed. The issue in this ticket is that it sometimes uses the wrong timeout, but if you configure all of them as infinity, no problem.