Closed xiaoqiangwang closed 7 years ago
Here is how the deadlock happens,
main thread python update thread
casEventSys::process casPVI::postEvent
casPVI::nativeCount casEventSys::postEvent
Hi Xiaoqiang, Do you have a pcaspy example that I can run to reproduce this? Thanks,
Hi Bruce,
I used the simscope.py as the test program.
After building pcaspy 0.6.2 or the current HEAD in the repo,
$ cd example
# make a soft link to the current build
$ ln -s ../build/lib.linux-x86_64-2.6/pcaspy .
$ python simscope.py
At another terminal, launch the MEDM panel, change the Update time to 0.01 second, and click Run.
medm -x -macro P=MTEST simscope.adl
After a few seconds (in my case), the sinuous wave stops updating. And now if you attach gdb to the PCASpy program, and exec command (gdb) thread apply all bt
. The traceback shows two threads are stuck in pthread_mutex_lock
, as show in the first post.
The official release of EPICS base 3.14.12.6 has fixed this deadlock.
Note that the fix mentioned above only remedies one particular race condition but there are more...
I filed a bug report here
The deadlock you discovered is indeed serious.
In theory the deadlock should happen soon if I do "caput -c " on a fast changing PV. I used simscopy.py as the test program (updating in 100Hz) and ran the "caput -c " in 50Hz.
count=1;while [ 1 ]; do caput -c MTEST:MinValue $((count++));sleep 0.02; done
However it cannot be reproduced. In what usage do you see this deadlock, in a cagateway?
Currenly all pcaspy applications assume that they can call
Driver::updatePVs()
at a separate thread. And the main thread will pick this request and send monitor events to clients.In the release of 3.14.12.6-rc1 and 3.15.5-rc1, the pcas server receives an update to support dynamic length array. The affecting change is the additional call to [
chan.getPVI().nativeCount()
] (https://github.com/epics-base/epics-base/commit/07434172319922b287ce6277b89ce2e82bced3cb#diff-4eafaaeaa6480eaed1cfd866d63cd0adR886) insidecasStrmClient::monitorResponse
.When the deadlock happens, here is the backtrace from both main thread 1 and python thread 2,