ionelmc / python-hunter

Hunter is a flexible code tracing toolkit.
https://python-hunter.readthedocs.io/
BSD 2-Clause "Simplified" License
794 stars 46 forks source link

Reduce gdb deadlocks #105

Open aantn opened 2 years ago

aantn commented 2 years ago

When using python-hunter's remote feature, a warning is printed that the process may deadlock. (See here.)

I strongly suspect this can be fixed by setting the gdb flag scheduler-locking off. IDEs like VSCode reliably inject code into Python processes to support debugging existing processes. VSCode's implementation explicitly mentions the importance of setting scheduler-locking off to prevent deadlocks

I unfortunately don't have the time to open a PR for python-hunter at the moment, but I do plan to maintain a fork of pyrasite with this and other fixes. I will update here if I find additional bugs that can cause deadlocks.

aantn commented 2 years ago

To clarify, this is a different deadlock than the one pydevd deals with as pydevd code uses PyRun_SimpleString whereas python-hunter uses Py_AddPendingCall. But you can see in the Python source code that Py_AddPendingCall acquires a mutex so not using scheduler-locking off could very conceivably lead to a deadlock here.

ionelmc commented 2 years ago

Well technically it could deadlock - if you manage to attach hunter via gdb at the exact time where a signal would be delivered to the process (signal delivery would use the Py_AddPendingCall routines). It's still better than using PyRun_SimpleString directly tho (that can deadlock way more easier - basically if you run it while something holds the GIL you're in for very unpredictable behavior - including deadlocks). That's why I used Py_AddPendingCall - it's the correct way with regard to the GIL.

I disagree with your assessment that VSCode reliably injects code into Python processes, or any debugger that relies on PyRun_SimpleString directly.

set scheduler-locking off basically allows other threads to run while gdb runs whatever code is injected - I guess we could add that for good measure but I wouldn't delete the warning just because there's one less chance for things to go sideways :-)

Make a PR with the change and lets see how it works.

aantn commented 2 years ago

Regarding PyRun_SimpleString, assuming you call PyGILState_Ensure first (all the debuggers do) and other threads are not stopped at that point in time, why is a deadlock possible?

ionelmc commented 2 years ago

How would the GIL get released for a single thread process if you pause said process while the GIL is held somewhere?

aantn commented 2 years ago

As far as I understand, PyGILState_Ensure is re-entrant so you can successfully call it from the thread that gdb stopped even if that thread already holds the GIL. If another thread holds the GIL, it's OK because scheduler-locking off will allow that thread to keep on running until it releases the GIL.

ionelmc commented 2 years ago

To be honest I don't know the cpython internals well enough to say it's right or wrong. From what I've tried the Py_AddPendingCall worked really well while PyRun_SimpleString didn't. Maybe all I missed was the scheduler-locking off - who knows. But we can integration test which technique is really the best, there are already some integration tests on the remote with gdb feature - they could be extended to include thread and all sorts of crazy stuff.

aantn commented 2 years ago

Cool, I'll give the tests a look. I've been meaning to play around with python-hunter in depth for a long time. I hope to have the justification to do so at my day job soon :)