Closed philipstarkey closed 5 years ago
Original comment by Russell Anderson (Bitbucket: rpanderson, GitHub: rpanderson).
This is effectively a deadlock, except that fortunately the zprocess event times-out (although then the shot is aborted due to that failure).
This depends on the device class. For the NI-PCIe-6363, the wait_durations_analysed
event has no timeout, as the wait
method of zprocess.Event
does not time out, by default. This means that wait_durations_analysed.wait(self.h5file)
hangs indefinitely if the deadlock occurs.
Another example from the spielman fork is the NI-USB-6343.
Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).
Resolved by pull request 73
Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).
Original report (archived issue) by Philip Starkey (Bitbucket: pstarkey, GitHub: pstarkey).
Calls to
device.transition_to_manual
by the queue manager are currently serialised.I believe this is an artefact of a version of BIAS which did not use zlock/h5lock and would thus cause HDF5 file corruption if BIAS wrote to the file while the lock was held by one of the BLACS devices. However, I believe this is no longer the case (need to double check) and that BIAS uses zlock/h5lock correctly now.
Regardless, this introduces an unfortunate error when using multiple acquisition cards with a wait monitor. If the device with the wait monitor is not transitioned to manual prior to all acquisition devices, then those acquisition devices will timeout waiting for the
zprocess.Event
to be issued indicating that waits have been processed (since that processing has not happened yet, and can't until the other acquisition device completes). This is effectively a deadlock, except that fortunately the zprocess event times-out (although then the shot is aborted due to that failure). This of course may work for some people, as the transition order is currently dependent on the iteration order of a dictionary, which is itself dependent on the names of each device in use along with the number of devices in use (see resources on how dictionary hashing works in Python).I believe @rpanderson recently experienced this bug when he visited JQI.
Anyway, the solution is to deserialise
transition_to_manual