themotte / rDrama

This code runs https://www.themotte.org. Forked from https://github.com/Aevann1/rDrama
GNU Affero General Public License v3.0
27 stars 31 forks source link

Figure out what's causing Python 3.11 instability #446

Open zorbathut opened 1 year ago

zorbathut commented 1 year ago

We have a problem where Python 3.11 is causing uncommon crashes on the live server. Some stacktrace screenshots:

image image

Things tried:

If you've got ideas on how to reproduce this, let me know, I'm happy to try stuff out.

justcool393 commented 1 year ago

this appears to be some sort of memory corruption of some sort. given the random nature of where the crashes are occurring and the fact that these crashes are occurring in both code that makes no sense to cause them and some other stack traces where the top frame was "Garbage collecting..." i'm relatively convinced of this.

python 3.11 iirc is notable for performance improvements and if i were to guess (no evidence for this) is that something expected something to be somewhere in python 3.10 but it's not there in python 3.11 and this is causing memory corruption.

given this i'm inclined to believe our culprit is one of (in descending order of likelihood)

  1. c extensions. there are a bunch of things we use C for indirectly and someone who isn't playing nice by using the correct memory allocation functions might be opening it up.
  2. python 3.11 itself. i find this unlikely, but i do think py3.11 is a factor.
  3. some random freak of nature that hates us specifically. this is prolly it tbh.
justcool393 commented 1 year ago

oddly enough one of the strange things with this is that it's hard for me to reproduce. i'm curious does prod + dev have any major differences with deployment than the docker version?

we probably could try and point valgrind at it to see what's blowing up but we'd need a coredump for that