spotify / pedalboard

🎛 🔊 A Python library for audio.
https://spotify.github.io/pedalboard
GNU General Public License v3.0
4.96k stars 249 forks source link

Avoid deadlocks and nondeterministic results when using the same AudioFile in multiple threads. #298

Closed psobot closed 3 months ago

psobot commented 3 months ago

This one's a big one. Sorry in advance.

Prior to this PR, Pedalboard allowed multiple threads to call methods on AudioFile simultaneously. AudioFile objects included an objectLock, intended to serialize this access to ensure "thread safety" (although this was vacuous, as how meaningful are the results returned by a file-like object being manipulated by multiple threads simultaneously?).

This was not a problem when reading files from disk. However, AudioFile permits the caller to provide a file-like object (io.BytesIO, etc) which can be implemented in Python. In this case, concurrent access to the same AudioFile object caused hard deadlocks in Python.

Consider the following example, in which the thread holding the GIL is annotated with 🐟, and the thread holding the AudioFile object's lock is annotated with 🔒:

Thread A Thread B
🐟 Call AudioFile.read(...)
🐟 Acquire the AudioFile's objectLock 🔒
🔒 🐟 Release Python's GIL
🔒 🐟 Call AudioFile.read(...)
🔒 🐟 Wait for AudioFile's objectLock 🔒
🔒 Call .read(...) on the file-like object 🐟
🔒 Wait for Python's GIL (to call back into Python) 🐟
🔒 ⏳ deadlocked 🐟 ⏳ deadlocked

This situation resulted in an uninterruptible Python interpreter (i.e.: Ctrl-C would not work), as the GIL was held by a thread that was in C++ code, and no Python signal handlers could run.

This PR makes significant changes to how AudioFile handles locking:

This PR also includes some new and comprehensive testing around locking; however, these tests are beasts. Again, sorry for the complexity there.