patroclos / PAmix

ncurses/curses pulseaudio mixer in c++ similar to pavucontrol
MIT License
140 stars 10 forks source link

Race condition for concurrently started/stopped applications #68

Closed m7a closed 3 years ago

m7a commented 3 years ago

When running speaker-test I found out that occasionally, pamix will hang at a black screen i.e. no longer update the display and also no longer react to input (except CTRL-C, CTRL-\ to quit the program). Upon debugging this with valgrind, I get the following output:

==20675== Invalid read of size 8
==20675==    at 0x4A68487: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==20675==    by 0x155A94: pamix_ui::getEntryDisplayName[abi:cxx11](Entry*) (pamix_ui.cpp:207)
==20675==    by 0x155145: pamix_ui::redrawAll() (pamix_ui.cpp:116)
==20675==    by 0x15117B: main (pamix.cpp:249)
==20675==  Address 0x10 is not stack'd, malloc'd or (recently) free'd

As far as I can tell, it is related to the maps accessed from pamix_ui being concurrently written to by painterface or possibly by the data behind the pointers from the map being deleted/overwritten just while being processed in pamix_ui.

speaker-test is not reliable in reproducing this issue, however, I found a means to reproduce it within less than a minute here (running two concurrent pamix instances increases chances of the problem to appear, running more than four concurrently seems to reduce the chance as processing slows down..)

Steps to reproduce:

  1. Prepare an audio file test.wav with just about a second of content.
  2. Run pamix in a terminal
  3. Run while true; do mpv test.wav; done in another terminal

Expected behaviour: New outputs will appear in pamix for short time and then vanish in short sequence. Pamix is expected to continue to function.

Observed behaviour: After some point, the terminal with pamix blackens and no further interaction with pamix is possible. Stopping the loop of mpv process does not recover pamix.

Suggested fix: (a) Use locking for concurrent read-write accesses (not only for write-write as it seems to be at the moment?) or (b) switch to concurrent data structures. I am not much of a C++ expert and I do not understand the idea behind the various locks in painterface.cpp thus am not sure wheter I could come up with a patch...

I am currently using version bf4acb38200062784f3ffc484ded6b2ce5a57162 but the old version from Debian (1.6~git20180112.ea4ab3b-3) seems to be affected by this bug, too.

Edit: It can be reproduced with fbdb80646d3e8c3542e0ad147c65a430e1a45f52 (dev), too. I also found out that there is already an existing issue about it: https://github.com/patroclos/PAmix/issues/59

patroclos commented 3 years ago

Thanks, that's quite some detailed digging you did. I appreciate it. I struggled a bit to reproduce this with your suggested steps, but when exiting the loop by holding Ctrl-C pamix blacked out. Maybe we should add a stress-test suite to the project to weed out these kinds of problems...

I swapped out the drawing mutex for the writing mutex in pamix_ui and that seems to deal with this problem. I don't remember why I used different mutexes in the first place, maybe just some premature optimization or something.

If you want to test the changes on your system try the dev branch on d9c632ecfa787de07c5e2d396f532c7ef61d5d84

m7a commented 3 years ago

I am not sure if such things can be tested comprehensively with sensible effort :). It is also difficult to deploy tests for problems that do not always occur as regression tests, because in the end one will get varying test results on the same code?

I can confirm that the newly uploaded changes to dev do indeed fix the issue. I ran it 8 hours with six pamix instances distributed across three virtual machines and none of them crashed whereas before, the problem would show up within less than a minute.

Thank you very much for the quick reply and fix, pamix is really appreciated!

patroclos commented 3 years ago

Glad to hear that.

Yeah, I maybe phrased it unclearly, but I was thinking about a tool to do manual testing instead of a CI pipeline kind of test. Sure, it's not gonna catch every problem every time, that's just in the nature of race conditions, but I feel like it would help establishing some confidence in the current state of things.