Sanic closes connection prematurely

nfergu commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues

Describe the bug

When a client downloads data slowly from a Sanic server, Sanic terminates the connection before all data has been downloaded. Specifically, the connection is terminated before all of the data specified by the content-length header has been downloaded.

I previously raised this as #2613, but that was closed. I'm raising this again because I'm pretty sure this is a bug.

Code snippet

Run the following Sanic server:

from sanic import Sanic
from sanic.response import raw

app = Sanic("MyHelloWorldApp")
app.config.REQUEST_TIMEOUT = 2000
app.config.RESPONSE_TIMEOUT = 2000

@app.get("/foo")
async def foo_handler(request):
    print(f"HEADERS: {request.headers}")
    data = bytearray(500 * 1024 * 1024)
    return raw(data)

if __name__ == "__main__":
    app.run(debug=True)

In a separate process, run this client application:

from time import sleep

import requests
import logging

logging.basicConfig(level=logging.DEBUG)

url = 'http://127.0.0.1:8000/foo'
s = requests.Session()
r = s.get(url, stream=True)
content_length = int(r.headers["content-length"])
print(f"Content length: {content_length}")
total = 0
for chunk in r.iter_content(chunk_size=1024 * 1024):
    total += len(chunk)
    sleep(0.1)
    print(f"Total downloaded: {total}")

assert total == content_length, f"Expected {content_length} bytes, but got {total} bytes"

Expected Behavior

I would expect the amount of data specified by the content-length header to be downloaded. However, the client fails with an error like this:

AssertionError: Expected 524288000 bytes, but got 222742503 bytes

Note that the amount of data downloaded is not always the same, but is reliably less than the length specified by the content-length header on my machine.

A Wireshark capture appears to show the server sending a FIN while the client is still downloading data.

How do you run Sanic?

As a script (app.run or Sanic.serve)

Operating System

Mac OS 12.2.1 (also appears to happen on Linux in our production system)

Sanic Version

22.9.1 (also tried 21.12.1 and it happens there too)

Additional context

This does not happen on the old version of Sanic we were using previously (19.12.5).

The behaviour can be changed, but not solved, by changing Sanic's KEEP_ALIVE_TIMEOUT to a larger value. However, it's still possible to reproduce the issue by downloading the data slightly more slowly. For example setting KEEP_ALIVE_TIMEOUT to 75 seconds, and the sleep time in the client to 0.5 seconds still reproduces the issue.

ahopkins commented 1 year ago

@nfergu I still do not see this as a bug. Your handler is taking too long to push that much data thru the socket. Increasing the timeout is the correct approach if you want to do it at once. I would instead suggest that you should stream the data. In this case there will be no timeout and the connection will close once the buffer is drained.

@app.get("/foo")
async def foo_handler(request):
    print(f"HEADERS: {request.headers}")
    data = bytearray(500 * 1024 * 1024)
    size = 1024 * 1024
    content_length = len(data)
    response = await request.respond(
        headers={"content-length": content_length}
    )
    for chunk in range(0, content_length, size):
        await response.send(data[chunk : chunk + size])

Tronic commented 1 year ago

Even better, use a websocket for large file transfers, or if websockets are not available, do separate requests to transfer it in small chunks. Especially if you are doing video streaming or such things where the user might wish to seek and not have his browser stuck loading (to RAM) the whole video file first.

nfergu commented 1 year ago

@ahopkins Thanks very much for your advice. Your solution does indeed solve the problem.

I'm still a bit unclear as to why this works though. My understanding is that my original code is timing-out because it's sending too much data in one go. However, if I modify your solution to send all of the data in one go it still works. For example, if I modify it as follows:

from sanic import Sanic

app = Sanic("MyHelloWorldApp")
app.config.REQUEST_TIMEOUT = 2000
app.config.RESPONSE_TIMEOUT = 2000

@app.get("/foo")
async def foo_handler(request):
    print(f"HEADERS: {request.headers}")
    data = bytearray(500 * 1024 * 1024)
    content_length = len(data)
    response = await request.respond(
        headers={"content-length": content_length}
    )
    await response.send(data)

if __name__ == "__main__":
    app.run(debug=True)

All of the data is downloaded successfully. As I understand it, this should be roughly equivalent to my original code. So there seems to be a difference between the streaming API and my original code.

ahopkins commented 1 year ago

Yes, perhaps this is something that should be documented. When building and sending a HTTPResponse as a return value from a handler, there is a task timer. This keeps track of how long a request/response cycle is and is meant to be a safety. When you create a response instead with request.respond, you are taking in the responsibility for sending the bytes and also the life of the response. As long as thst function is open it will not be Cancelled. This is how streaming works. You have an indefinite amount of time to push and are no longer bound by the timeouts.

nfergu commented 1 year ago

Thanks @ahopkins. This behaviour was not intuitive for me at least, so perhaps it should be documented, as you say.

I guess another aspect I found confusing is that this timeout appears to be controlled by changing Sanic's KEEP_ALIVE_TIMEOUT. My understanding was that keep-alive timeouts are for controlling how long connections are kept open between requests, not for controlling the timeout for sending a response. From the Sanic docs:

Keep-Alive is a HTTP feature introduced in HTTP 1.1. When sending a HTTP request, the client (usually a web browser application) can set a Keep-Alive header to indicate the http server (Sanic) to not close the TCP connection after it has send the response. This allows the client to reuse the existing TCP connection to send subsequent HTTP requests, and ensures more efficient network traffic for both the client and the server.

In addition, I can turn off keep-alive entirely, but the timeout doesn't occur immediately in this case.

There's obviously REQUEST_TIMEOUT and RESPONSE_TIMEOUT but these are set to a very large value in my test, so I'm not sure they are controlling this behaviour (and the docs for them don't match the behaviour I'm seeing I think).

ahopkins commented 1 year ago

@nfergu Yes, you have embarrassingly hit upon something that @Tronic and I have discussed in the past: the timeout controls are not entirely accurate and perhaps more complicated than need be. We plan to address this, but not until next year sometime because it will likely be a breaking change.

sanic-org / sanic