psmokotnin / osm

Open sound meter. FFT based application for tuning sound systems.
GNU General Public License v3.0
222 stars 35 forks source link

Deadlock in ALSA plugin #56

Closed NexAdn closed 11 months ago

NexAdn commented 1 year ago

I've found a deadlock in the ALSA plugin, causing the whole program to freeze every time I try to switch the input in OpenSoundMeter. To trace the issue down, I've inserted a bunch of std::cout statements in the code, which also show where the mutexes get locked and unlocked:

Measurement::updateAudio(): async entry
Measurement::updateAudio(): async pre-setCallback
Measurement::updateAudio(): async pre-getInstance
Measurement::updateAudio(): async pre-openInput
audio::Client::openInput(): entry
audio::Client::openInput(): pluginForDevice()
audio::Client::openInput(): targetPlugin->open
0x557750193a90  ALSA
AlsaPlugin::open() entry, 0x557750193ac0 locking
AlsaPlugin::open() locked
AlsaPlugin::open(): Creating new AlsaPCMDevice
AlsaPlugin::open(): starting device
AlsaPlugin::open(): setKeepAlive()
AlsaPlugin::open(): new Stream()
AlsaPlugin::open(): endpoint->open()
AlsaPlugin::open(): Waiting for device->active()
AlsaPlugin::open(): Returning stream, unlocking 0x557750193ac0
audio::Client::openInput(): return
Measurement::updateAudio(): async pre-connect
Measurement::updateAudio(): async pre-emit
Measurement::updateAudio(): async exit
Invalid QML element name "meta::Filter"; type names must begin with an uppercase letter
Invalid QML element name "meta::Measurement"; type names must begin with an uppercase letter
Qt Quick Layouts: Detected recursive rearrange. Aborting after two iterations.
Qt Quick Layouts: Detected recursive rearrange. Aborting after two iterations.
Qt Quick Layouts: Detected recursive rearrange. Aborting after two iterations.
AlsaPCMDevice::start() locking 0x557750193ac0
Measurement::updateAudio(): async entry
Measurement::updateAudio(): async pre-setCallback
Measurement::updateAudio(): async pre-getInstance
Measurement::updateAudio(): async pre-openInput
audio::Client::openInput(): entry
audio::Client::openInput(): pluginForDevice()
audio::Client::openInput(): targetPlugin->open
0x557750193a90  ALSA
AlsaPlugin::open() entry, 0x557750193ac0 locking

as far as I can see it, the main issue is that the AlsaPCMDevice holds a lock on the mutex AlsaPlugin::m_deviceListMutex as long as it isn't closed. When switching the device, the old AlsaPCMDevice isn't closed and thus the mutex isn't unlocked, causing the deadlock.

Note that I was sometimes unable to reproduce the issue with CONFIG+="debug", as long as I inserted std::cout statements. Thus, there is some weird timing issue, which may cause problems when trying to reproduce the issue.

System: Gentoo Linux with Qt 5.15.5 installed, using OSM version 1.2.1 (tried both the AppImage and self-compiled from master)

NexAdn commented 1 year ago

Apart from that, I've noticed that the objects seem to hold that specific mutex probably way longer than they actually need to. This may encourage deadlocks.

psmokotnin commented 1 year ago

@NexAdn Thank you, I'll take a look. Or you can make a PR with a fix.

NexAdn commented 1 year ago

PR is out. But note that I've noticed some other issue after applying the fix, so I'm a bit unsure whether that fix is actually correct. In any case, it should serve at least as a reference on how to fix the underlying problem.

FelixHeppner commented 1 year ago

I face the same issue. The (slightly modified, see below) second commit by NexAdn fixes it for me. Thanks a lot!

My C++ skills haven't been used a lot recently... But what is the reason for locking the list mutex at all in end of thread code line 329 in alsa.cpp ? Nothing is done holding the lock except for access to flags. The list locking is fine for add and remove in AlsaPlugin:open imho. I removed locking in the end of thread completely and switch input cards works as expected.

NexAdn commented 1 year ago

But what is the reason for locking the list mutex at all in end of thread code line 329 in alsa.cpp ?

I honestly don't know, but I assume that there must be something, since the mutex is explicitly locked there (although I have noticed that this mutex is locked way longer than needed in this plugin – hence my introduction of std::lock_guard and std::unique_lock in the places where the device list is accessed. But yeah, looking at the code at a quick glance I see nothing justifying locking the mutex there. I guess the locking in this plugin might need some reworking in general, since I consider passing a mutex between multiple objects a bad idea anyway. Then again, I don't know the software's architecture that well—only enough to file my two still unmerged PRs.

I don't know how long it will take for upstream to properly fix the matter or merge my PR, but I hope it doesn't take that much longer (although I've been using a local fork with my two patchsets applied as an interim solution to have a working software). But I assume Linux isn't a high priority for this project, since most of the proprietary stuff for system design and tuning in general is Windows or Mac only anyway and thus Linux users in this area are probably almost nonexistent.