Closed m7a closed 3 years ago
Thanks, that's quite some detailed digging you did. I appreciate it. I struggled a bit to reproduce this with your suggested steps, but when exiting the loop by holding Ctrl-C pamix blacked out. Maybe we should add a stress-test suite to the project to weed out these kinds of problems...
I swapped out the drawing mutex for the writing mutex in pamix_ui and that seems to deal with this problem. I don't remember why I used different mutexes in the first place, maybe just some premature optimization or something.
If you want to test the changes on your system try the dev branch on d9c632ecfa787de07c5e2d396f532c7ef61d5d84
I am not sure if such things can be tested comprehensively with sensible effort :). It is also difficult to deploy tests for problems that do not always occur as regression tests, because in the end one will get varying test results on the same code?
I can confirm that the newly uploaded changes to dev
do indeed fix the issue. I ran it 8 hours with six pamix
instances distributed across three virtual machines and none of them crashed whereas before, the problem would show up within less than a minute.
Thank you very much for the quick reply and fix, pamix
is really appreciated!
Glad to hear that.
Yeah, I maybe phrased it unclearly, but I was thinking about a tool to do manual testing instead of a CI pipeline kind of test. Sure, it's not gonna catch every problem every time, that's just in the nature of race conditions, but I feel like it would help establishing some confidence in the current state of things.
When running
speaker-test
I found out that occasionally,pamix
will hang at a black screen i.e. no longer update the display and also no longer react to input (except CTRL-C, CTRL-\ to quit the program). Upon debugging this with valgrind, I get the following output:As far as I can tell, it is related to the maps accessed from
pamix_ui
being concurrently written to bypainterface
or possibly by the data behind the pointers from the map being deleted/overwritten just while being processed inpamix_ui
.speaker-test
is not reliable in reproducing this issue, however, I found a means to reproduce it within less than a minute here (running two concurrentpamix
instances increases chances of the problem to appear, running more than four concurrently seems to reduce the chance as processing slows down..)Steps to reproduce:
test.wav
with just about a second of content.pamix
in a terminalwhile true; do mpv test.wav; done
in another terminalExpected behaviour: New outputs will appear in pamix for short time and then vanish in short sequence. Pamix is expected to continue to function.
Observed behaviour: After some point, the terminal with
pamix
blackens and no further interaction withpamix
is possible. Stopping the loop ofmpv
process does not recoverpamix
.Suggested fix: (a) Use locking for concurrent read-write accesses (not only for write-write as it seems to be at the moment?) or (b) switch to concurrent data structures. I am not much of a C++ expert and I do not understand the idea behind the various locks in
painterface.cpp
thus am not sure wheter I could come up with a patch...I am currently using version bf4acb38200062784f3ffc484ded6b2ce5a57162 but the old version from Debian (1.6~git20180112.ea4ab3b-3) seems to be affected by this bug, too.
Edit: It can be reproduced with fbdb80646d3e8c3542e0ad147c65a430e1a45f52 (dev), too. I also found out that there is already an existing issue about it: https://github.com/patroclos/PAmix/issues/59