unbit / uwsgi

uWSGI application server container
http://projects.unbit.it/uwsgi
Other
3.46k stars 691 forks source link

Random "Segmentation Fault" while receiving binary data from websocket (with gevent) #2225

Open igoras1993 opened 4 years ago

igoras1993 commented 4 years ago

Hi, I am currently experiencing Error continuation with empty previous buffer!!! uWSGI process 24142 got Segmentation Fault !!! error in my application. The code is quite complicated and I'm afraid that I'm not able to provide a complete minimal example.

Application stack (enumerating most important ones):

Backtrace:

emitting event "get_file_chunk" to 382c57b51aaa48bf9b8d604971dab346 [/transfer]
382c57b51aaa48bf9b8d604971dab346: Sending packet MESSAGE data 2/transfer,41["get_file_chunk","{\"md5\": \"30a37e7f03722850f850115fa5c3b12e\", \"start\": 0, \"length\": 65536}"]
emitting event "get_file_chunk" to 382c57b51aaa48bf9b8d604971dab346 [/transfer]
382c57b51aaa48bf9b8d604971dab346: Sending packet MESSAGE data 2/transfer,42["get_file_chunk","{\"md5\": \"62ed7bdf2d3130eb2ab6e972ed8f8fe6\", \"start\": 0, \"length\": 65536}"]
emitting event "get_file_chunk" to 382c57b51aaa48bf9b8d604971dab346 [/transfer]
382c57b51aaa48bf9b8d604971dab346: Sending packet MESSAGE data 2/transfer,43["get_file_chunk","{\"md5\": \"0436c6aafabcfb61d37d395e8e24d99f\", \"start\": 0, \"length\": 65536}"]
Error continuation with empty previous buffer!!! uWSGI process 24142 got Segmentation Fault !!!
*** backtrace of 24142 ***
/home/deploy/venv/bin/uwsgi(uwsgi_backtrace+0x2a) [0x562f3abc122a]
/home/deploy/venv/bin/uwsgi(uwsgi_segfault+0x23) [0x562f3abc1613]
/lib/x86_64-linux-gnu/libc.so.6(+0x3efd0) [0x7f1908e25fd0]
/home/deploy/venv/bin/uwsgi(uwsgi_buffer_destroy+0x4) [0x562f3aba50d4]
/home/deploy/venv/bin/uwsgi(+0x64d54) [0x562f3ab9dd54]
/home/deploy/venv/bin/uwsgi(uwsgi_websocket_recv_nb+0x1c) [0x562f3ab9dfcc]
/home/deploy/venv/bin/uwsgi(py_uwsgi_websocket_recv_nb+0x22) [0x562f3abdf192]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyCFunction_FastCallDict+0x1bb) [0x7f19093eb53b]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17c1ec) [0x7f19093541ec]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x4ec3) [0x7f190935a303]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17a803) [0x7f1909352803]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17c2be) [0x7f19093542be]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x4ec3) [0x7f190935a303]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17ba0f) [0x7f1909353a0f]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17c0fc) [0x7f19093540fc]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x4ec3) [0x7f190935a303]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17ba0f) [0x7f1909353a0f]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyFunction_FastCallDict+0x1e4) [0x7f1909353d34]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyObject_FastCallDict+0x131) [0x7f190943efd1]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyObject_Call_Prepend+0xcd) [0x7f190943f6fd]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(PyObject_Call+0x48) [0x7f190943f768]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x1f489d) [0x7f19093cc89d]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyObject_FastCallDict+0x89) [0x7f190943ef29]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17bec8) [0x7f1909353ec8]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x4ec3) [0x7f190935a303]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17a803) [0x7f1909352803]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17c2be) [0x7f19093542be]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x4ec3) [0x7f190935a303]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17a803) [0x7f1909352803]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17c2be) [0x7f19093542be]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x4ec3) [0x7f190935a303]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17ba0f) [0x7f1909353a0f]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17c0fc) [0x7f19093540fc]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x4ec3) [0x7f190935a303]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17a803) [0x7f1909352803]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17c2be) [0x7f19093542be]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x4ec3) [0x7f190935a303]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17a803) [0x7f1909352803]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17c2be) [0x7f19093542be]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x4ec3) [0x7f190935a303]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17a803) [0x7f1909352803]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17c2be) [0x7f19093542be]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x4ec3) [0x7f190935a303]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17a803) [0x7f1909352803]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17c2be) [0x7f19093542be]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x4ec3) [0x7f190935a303]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17ba0f) [0x7f1909353a0f]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17c0fc) [0x7f19093540fc]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x4ec3) [0x7f190935a303]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17a803) [0x7f1909352803]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17c2be) [0x7f19093542be]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x4ec3) [0x7f190935a303]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17a803) [0x7f1909352803]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17c2be) [0x7f19093542be]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x4ec3) [0x7f190935a303]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17a803) [0x7f1909352803]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17c2be) [0x7f19093542be]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x4ec3) [0x7f190935a303]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17ba0f) [0x7f1909353a0f]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyFunction_FastCallDict+0x1e4) [0x7f1909353d34]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyObject_FastCallDict+0x131) [0x7f190943efd1]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyObject_Call_Prepend+0xcd) [0x7f190943f6fd]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(PyObject_Call+0x48) [0x7f190943f768]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x1f489d) [0x7f19093cc89d]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyObject_FastCallDict+0x89) [0x7f190943ef29]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17bec8) [0x7f1909353ec8]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x4ec3) [0x7f190935a303]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x17a803) [0x7f1909352803]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyFunction_FastCallDict+0x2d5) [0x7f1909353e25]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyObject_FastCallDict+0x131) [0x7f190943efd1]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(_PyObject_Call_Prepend+0xcd) [0x7f190943f6fd]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(PyObject_Call+0x48) [0x7f190943f768]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(+0x1f489d) [0x7f19093cc89d]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(PyObject_Call+0x48) [0x7f190943f768]
/home/deploy/venv/bin/uwsgi(python_call+0x11) [0x562f3abd42a1]
/home/deploy/venv/bin/uwsgi(uwsgi_request_wsgi+0x22f) [0x562f3abd665f]
/home/deploy/venv/bin/uwsgi(py_uwsgi_gevent_request+0xbd) [0x562f3abe3b4d]
/usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0(PyCFunction_Call+0xc1) [0x7f19093eb761]
/home/deploy/venv/lib/python3.6/site-packages/gevent/_greenlet.cpython-36m-x86_64-linux-gnu.so(+0x34ad2) [0x7f18e823bad2]
*** end of backtrace ***
DAMN ! worker 1 (pid: 24142) died :( trying respawn ...
Respawned uWSGI worker 1 (new pid: 24186)
*** running gevent loop engine [addr:0x562f3abe3de0] ***

First 6 lines are coming from debug logs of Flask-Socketio application. Since there is no indication that my app have started processing incoming request, I cannot provide an exact line of code that produces this error. Also this means, that segfault is triggered before/after my app started processing request. The 6 lines of debug logs are indicating that my application is requesting some binary data (using flask-socketio here) from its client, and this data should be transferred using acknowledge callback (again socketio framework). Surely client transfers this data and uwsgi layer tries to receive it, but it gets this SEGFAULT error.

Here is the code that realizes that logic, but I am assuming that it is not that useful in this case:

def file_uploader(transfer_sid, file_md5, start=0, chunk_size=65536, timeout=0, max_retries=0):
    g_buffer = ChunkBuffer()

    def receive_chunk(chunk, abort=False):
        if abort:
            g_buffer.abort()

        try:
            g_buffer.write(chunk)
        except ValueError as e:
            log(f"Aborting upload due to value error: {e}")
            g_buffer.abort()

    chunk_start = start

    # initial request
    get_file_chunk(
        md5=file_md5,
        sid=transfer_sid,
        start=chunk_start,
        length=chunk_size,
        ack_callback=receive_chunk
    )
    waiting_since = time.time()
    retries_cnt = 0
    msg = None

    # waiting for chunks
    while True:
        # idle and wait in queue if data are not available
        if not g_buffer.is_new_available():
            waiting_for = time.time() - waiting_since

            # check consumer abort, max retries and timeout
            if msg == ABORT_MARK:
                upload_aborted(md5=file_md5, sid=transfer_sid, reason="consumer")
                log(f"Upload file {file_md5} of client's sid {transfer_sid} aborted: consumer.")
                yield ABORT_MARK
                raise StopIteration

            if timeout and waiting_for > timeout:
                upload_aborted(md5=file_md5, sid=transfer_sid, reason="timeout")
                log(f"Upload file {file_md5} of client's sid {transfer_sid} aborted: timeout {waiting_for}.")
                yield ABORT_MARK
                raise StopIteration

            elif max_retries and retries_cnt > max_retries:
                upload_aborted(md5=file_md5, sid=transfer_sid, reason="retries")
                log(f"Upload file {file_md5} of client's sid {transfer_sid} aborted: retries: {retries_cnt}.")
                yield ABORT_MARK
                raise StopIteration

            gevent.idle()
            continue

        # data available, read and process
        else:
            data = g_buffer.read()

            #
            if data == ChunkBuffer.ABORT_MARK:
                upload_aborted(md5=file_md5, sid=transfer_sid, reason="request")
                log(f"Upload file {file_md5} of client's sid {transfer_sid} aborted: client request.")
                yield ABORT_MARK
                raise StopIteration

            elif len(data) < chunk_size:
                if len(data) != 0:
                    msg = yield data
                else:
                    upload_completed(md5=file_md5, sid=transfer_sid)
                    log(f"Upload file {file_md5} of client's sid {transfer_sid} completed.")
                    raise StopIteration

                upload_completed(md5=file_md5, sid=transfer_sid)
                log(f"Upload file {file_md5} of client's sid {transfer_sid} completed.")
                raise StopIteration

            elif len(data) == chunk_size:
                msg = yield data

                # fetch next chunk
                chunk_start += chunk_size
                get_file_chunk(
                    md5=file_md5,
                    sid=transfer_sid,
                    start=chunk_start,
                    length=chunk_size,
                    ack_callback=receive_chunk
                )
                waiting_since = time.time()
                retries_cnt = 0

            elif len(data) > chunk_size:

                # fetch once more
                get_file_chunk(
                    md5=file_md5,
                    sid=transfer_sid,
                    start=chunk_start,
                    length=chunk_size,
                    ack_callback=receive_chunk
                )
                waiting_since = time.time()
                retries_cnt += 1

This is a generator that is instantiated and then app code iterates ofer it. get_file_chunk() function is emiting event get_file_chunk and receive_chunk() is the actual callback that should be invoked, but is not because of uWSGI fault.

Server layers looks like this:

Nginx ---(uwsgi_pass)---> uWSGI instance ---> python app

Nginx configuration (a part of it):

     location /socket.io {
         include uwsgi_params;
         uwsgi_buffering off;
         uwsgi_pass 127.0.0.1:5050; #socket.io;
     }

Uwsgi instance configuration (.ini file):

[uwsgi]
module = run_wsgi:app

socket = 127.0.0.1:5050
master = true
processes = 1
http-websockets = true
buffer-size=32768
gevent = 800

cache2 = name=wedbox,items=300,blocksize=1500000,purge_lru=true

vacuum = true

die-on-term = true

This don't happens always, usualy everything is okay, but when server is on some load, this will ocur once per ~100 chunks beeing requested.

Any ideas?

jesoonluo commented 3 years ago

I have the same error when I use python3.8.6 but all fine with 3.6.8. but if do not set uwsgi param gevent,request is pending always.