lookit / lookit-api

Codebase for Lookit v2 and Experimenter v2. Includes an API. Docs: http://lookit.readthedocs.io/
https://lookit.mit.edu/
MIT License
10 stars 18 forks source link

CRITICAL: µwsgi segfaulting causing infinite crashing #581

Closed Datamance closed 4 years ago

Datamance commented 4 years ago

After the staging deploy triggered by the changelog introduction, µwsgi logs are showing some very concerning messages:

uwsgi /usr/local/lib/libpython3.8.so.1.0(+0x12b099) [0x7f670baa5099]
uwsgi /usr/local/lib/libpython3.8.so.1.0(+0x12923b) [0x7f670baa323b]
uwsgi /usr/local/lib/libpython3.8.so.1.0(PyObject_CallMethod+0xb2) [0x7f670baa3072]
uwsgi uwsgi(+0xb1664) [0x55f6fa4a7664]
uwsgi uwsgi(uwsgi_ignition+0x112) [0x55f6fa4805f2]
uwsgi uwsgi(uwsgi_worker_run+0x25e) [0x55f6fa484e2e]
uwsgi uwsgi(uwsgi_run+0x434) [0x55f6fa485384]
uwsgi uwsgi(+0x3cf3e) [0x55f6fa432f3e]
uwsgi /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb) [0x7f670b7db09b]
uwsgi uwsgi(_start+0x2a) [0x55f6fa432f6a]
uwsgi *** end of backtrace ***
uwsgi DAMN ! worker 1 (pid: 163) died :( trying respawn ...
uwsgi Respawned uWSGI worker 1 (new pid: 164)
uwsgi *** running gevent loop engine [addr:0x55f6fa4a7100] ***
uwsgi !!! uWSGI process 164 got Segmentation Fault !!!
uwsgi *** backtrace of 164 ***
uwsgi uwsgi(uwsgi_backtrace+0x2a) [0x55f6fa48006a]
uwsgi uwsgi(uwsgi_segfault+0x23) [0x55f6fa480423]
uwsgi /lib/x86_64-linux-gnu/libc.so.6(+0x37840) [0x7f670b7ee840]

Builds are also showing very concerning messages concerning greenlet

Step #4 - "run-tests": <frozen importlib._bootstrap>:219: RuntimeWarning: greenlet.greenlet size changed, may indicate binary incompatibility. Expected 144 from C header, got 152 from PyObject
Step #4 - "run-tests": <frozen importlib._bootstrap>:219: RuntimeWarning: greenlet.greenlet size changed, may indicate binary incompatibility. Expected 144 from C header, got 152 from PyObject
Step #4 - "run-tests": <frozen importlib._bootstrap>:219: RuntimeWarning: greenlet.greenlet size changed, may indicate binary incompatibility. Expected 144 from C header, got 152 from PyObject

This situation is in progress, I am working on fixing the build

Datamance commented 4 years ago

On the bad build, version bump:

Step #3 - "build-image": Collecting greenlet>=0.4.16; platform_python_implementation == "CPython"
Step #3 - "build-image":   Downloading greenlet-0.4.17-cp38-cp38-manylinux1_x86_64.whl (48 kB)

On the last good build:

Step #3 - "build-image": Collecting greenlet>=0.4.16; platform_python_implementation == "CPython"
Step #3 - "build-image":   Downloading greenlet-0.4.16-cp38-cp38-manylinux1_x86_64.whl (48 kB)

Fun. This is what happens when we have non-deterministic builds. It looks like the greenlet project merged the 0.4.17 tag 13 days ago, so the unpinned transitive requirement changed under our feet and borked µwsgi.

Here is the related bug filed with the greenlet team.

I've scaled the replicas for the web service down to 0 for now. Will issue a bugfix shortly just pinning to the downgraded version of greenlet.

Datamance commented 4 years ago

Actually, let's try upgrading gevent and pinning greenlet to 0.4.17 first.

Datamance commented 4 years ago

"Fixed" and by "fixed" I mean this is a stop gap solution until we have something like #562 to prevent transitive dependency drift from screwing up future builds