unbit / uwsgi

uWSGI application server container
http://projects.unbit.it/uwsgi
Other
3.46k stars 691 forks source link

Broken socket / connection reset during GET #2262

Closed sn1572 closed 3 years ago

sn1572 commented 3 years ago

Hello all. I have a small web app that wants to stream video files, but encounters broken sockets / connection resets while trying to transfer .mp4 files.

Note: Using Flask and uwsgi by themselves - no Nginx etc.

Here is a snippet from the logs:

*** Starting uWSGI 2.0.19.1 (32bit) on [Tue Dec  1 15:36:40 2020] ***
compiled with version: 8.3.0 on 17 June 2020 09:19:11
os: Linux-5.4.72-v7l+ #1356 SMP Thu Oct 22 13:57:51 BST 2020
nodename: raspberrypi
machine: armv7l
clock source: unix
pcre jit disabled
detected number of CPU cores: 4
...
spawned uWSGI http 1 (pid: 3008)
{address space usage: 75792384 bytes/72MB} {rss usage: 30023680 bytes/28MB} [pid: 3007|app: 0|req: 1/1] 192.168.1.78 () {42 vars in 1052 bytes} [Tue Dec  1 15:36:45 2020] GET /shared/Gopro/ => generated 24586 bytes in 218 msecs (HTTP/1.1 200) 4 headers in 190 bytes (1 switches on core 0)
{address space usage: 75792384 bytes/72MB} {rss usage: 30150656 bytes/28MB} [pid: 3007|app: 0|req: 2/2] 192.168.1.78 () {40 vars in 976 bytes} [Tue Dec  1 15:36:46 2020] GET /assets/fonts/fontawesome-webfont.woff2?v=4.6.3 => generated 232 bytes in 3 msecs (HTTP/1.1 404) 2 headers in 87 bytes (1 switches on core 0)
get range finds: bytes=0-
start: 0, chunk_size: 131072, length: 11971174
Yielding 131072 bytes
Yielding 131072 bytes
Yielding 131072 bytes
Yielding 131072 bytes
Yielding 131072 bytes
Yielding 131072 bytes
Yielding 131072 bytes
Yielding 131072 bytes
Yielding 131072 bytes
Yielding 131072 bytes
Yielding 131072 bytes
Yielding 131072 bytes
Yielding 131072 bytes
Yielding 131072 bytes
Yielding 131072 bytes
Yielding 131072 bytes
Yielding 131072 bytes
Yielding 131072 bytes
Yielding 131072 bytes
Yielding 131072 bytes
[uwsgi-http key: 192.168.1.73:5000 client_addr: 192.168.1.78 client_port: 52393] hr_write(): Connection reset by peer [plugins/http/http.c line 565]
Tue Dec  1 15:36:50 2020 - uwsgi_response_write_body_do(): Connection reset by peer [core/writer.c line 429] during GET /shared/Gopro/GH010004.mp4 (192.168.1.78)
OSError: write error
{address space usage: 75948032 bytes/72MB} {rss usage: 30179328 bytes/28MB} [pid: 3007|app: 0|req: 3/3] 192.168.1.78 () {40 vars in 912 bytes} [Tue Dec  1 15:36:50 2020] GET /shared/Gopro/GH010004.mp4 => generated 2490368 bytes in 67 msecs (HTTP/1.1 206) 4 headers in 133 bytes (20 switches on core 0)
get range finds: bytes=0-
start: 0, chunk_size: 131072, length: 11971174
Yielding 131072 bytes
Yielding 131072 bytes
Yielding 131072 bytes
Yielding 131072 bytes

and it goes on like that for a while. This is the function that is responsible for generating the data stream (ignore the ptr_to_bytes_read, this is a code artifact that doesn't currently do anything):

def streaming_response(path, start, end=None):
    def make_response_chunk(path, start, chunk_size, length, ptr_to_bytes_read):
        print("start: {}, chunk_size: {}, length: {}".format(
            start, chunk_size, length))
        iters = length // chunk_size
        for i in range(iters):
            with open(path, 'rb') as fd:
                fd.seek(start)
                bytes_read = fd.read(chunk_size)
            errmsg = "Chunk size: {}, bytes read: {}".format(
                chunk_size,
                len(bytes_read))
            assert len(bytes_read) == chunk_size, errmsg
            ptr_to_bytes_read[0] = chunk_size
            print("Yielding {} bytes".format(chunk_size))
            yield bytes_read
            start += chunk_size
        print("made it to remainder read")
        remainder = length-chunk_size*iters
        if remainder > 0:
            with open(path, 'rb') as fd:
                fd.seek(start)
                bytes_read = fd.read(remainder)
            errmsg = "Remainder bytes: {}, bytes read: {}".format(
                remainder,
                len(bytes_read))
            #assert len(bytes_read) == remainder, errmsg
            ptr_to_bytes_read[0] = remainder
            yield bytes_read

    file_size = os.path.getsize(path)
    if end is None:
        length = file_size-start
    else:
        length = end-start
    msg = "Length must be non-negative. Found: {}".format(
            length)
    assert length >= 0, msg

    #chunk_size = 1 << 20 #one megabyte
    #chunk_size *= 10
    chunk_size = 1 << 17

    ptr_to_bytes_read = [0]
    response = Response(
        stream_with_context(
            make_response_chunk(path, start, chunk_size,
                length, ptr_to_bytes_read
            )
        ),
        206,
        mimetype=mimetypes.guess_type(path)[0],
        content_type=mimetypes.guess_type(path)[0],
        direct_passthrough=True,
    )
    response.headers.add(
        'Content-Range', 'bytes {0}-{1}/{2}'.format(
            start, start+chunk_size, file_size-1
        ),
    )
    response.headers.add(
        'Accept-Ranges', 'bytes'
    )
    return(response)

What's odd here is that streaming_response works fine when delivering .jpg image files, but it behaves kind of strangely when streaming audio. It sounds like the track starts playing twice, with one copy delayed slightly.

I've read all the usual threads online about how to fix broken pipe / connection reset issues and so far nothing's worked. Any help would be greatly appreciated.

sn1572 commented 3 years ago

Here's some additional info. This issue does not seem to occur when using the requests library, so it may be browser-related. By examining the request header that Chrome logs, I was able to send an identical request like this:

import requests

def get_chunk_1():
    URL = 'http://192.168.1.73:5000/shared/Gopro/GH010004.mp4'
    response = requests.get(URL,
        headers={'Accept': '*/*',
            'Accept-Encoding': 'identity;q=1, *;q=0',
            'Accept-Language': 'en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7',
            'Connection': 'keep-alive',
            'Cookie': 'hide-dotfile=no; session=<insert nasty hash here>',
            'Host': '192.168.1.73:5000',
            'Range': 'bytes=0-',
            'Referer': 'http://192.168.1.73:5000/shared/Gopro/',
            'User-Agent': 'Mozilla/5.0 (X11; CrOS x86_64 13421.102.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.199 Safari/537.36'
        }
    )
    return(response)

Since I am trying to stream .mp4 videos, a sufficient test of functionality was to do this:

with open('test.mp4', 'wb') as f:
    f.write(response.content)

and then verify that the video plays normally on my Chromebook, which it does. Checking the server logs verifies that it is providing the content as intended (you can see "yielding x byes" messages without interruption).

Furthermore, looking through Chrome logs I see an error NOTREACHED which is "always a code error" to paraphrase the Chrome dev tools. I would blame Chrome entirely except that we can't get Firefox to work either. :)

Still investigating.

sn1572 commented 3 years ago

This seems to be related: https://support.google.com/chrome/thread/25510119?hl=en

If true, Chrome issues a request for the whole file (bytes=0-) then intentionally breaks the connection at the TCP/IP level when it's read what it wants. After that it issues another request for the next chunk.

sn1572 commented 3 years ago

This is fixed now - not a wsgi issue. Turns out Chrome asks for the whole file (bytes 0-) but actually won't accept a response with too many bytes. This line of code caused Chrome to behave as expected, where max_length = 1<<20.

if max_length and length > max_length:
    length = max_length