mixxxdj / mixxx

Mixxx is Free DJ software that gives you everything you need to perform live mixes.
http://mixxx.org
Other
4.53k stars 1.28k forks source link

tsan fix browsethread #13872

Open m0dB opened 1 week ago

m0dB commented 1 week ago

Fixes #13861 and improves the code quality of BrowseThread a bit.

m0dB commented 1 week ago

Some comments. I have not yet fully understand the issue.

To explain the issue: tsan reported a data race in browsetablemodel.cpp:163 when connecting the table model with the result from BrowseThread::getInstanceRef(). IIRC the fix is waiting until the thread is started before returning from getInstanceRef (in the BrowseThread constructor) but I also changed getInstanceRef itself because the placing of the mutex didn’t seem correct to me.

The other changes are mainly esthetics / code clarity.

I do have my doubts about this code, it seems strange to me that a new populate request for any browsetablemodel aborts the running request. But I didn’t want to change to behavior.

m0dB commented 1 week ago

I have checked with Qt 6 with thread sanitizer activated through the Qt feature flags, and it still has to same problem, so I would like to ask to merge this PR. I feel I addressed all comments.

m0dB commented 5 days ago

Is there is a reason not to merge this? I confirm that, including with a thread sanitizer enabled Qt, I get TSAN warnings unless I wait for the thread to have started.

daschuer commented 2 days ago

I did a bit of a research and discovered a related QT issues.

https://bugreports.qt.io/browse/QTBUG-100336 .. looks quite similar. My guess is that the different backtrace happens due to not optimized code.

Can you please use the attached the testing code and confirm that this is actually our issue?

This bug is fixed in: Qt v6.8.0 and planned to 6.5 the stable branch we use for Mixxx 2.5 https://github.com/qt/qtbase/commit/75d82afa0d3aad9b4f9857e439535fc49c4616bc

I was able to trace it down to https://github.com/qt/qtbase/commit/34fe9232dbf6a9fe58ebc4c7680bb67d2f642c40

receiver->d_func()->connections.loadRelaxed()->currentSender = previous

Where it has been portet from QAtomic in v5.14.0 But the issue exists way longer with the old API.

The issue seems affect a bunch of atomic variables and whenever a connection is astabisched between threads. Most likely this fix fixes only one symptome of the issue and the issue will happen again with the other signals. Luckily this is currently "only" a linter issue and we do not have a real user issue with that, I would prefer to revert the quite bulky and run-time consuming solution here and just wait for the upstream fix. The other refactorings in this thread are welcome.

An alternative solution for all signals would be to connect the the connections via "m_model_observer" from the worker thread itself, this way the affected variables are only touched by one thread. I consider this a nice refactoring that can remain even if the upstrem issue is fixed. But I don't want to put that work on you. You may also directky "QMetaObject::invokeMethod" without connecting.