labscript-suite-temp / blacs

BLACS, part of the labscript suite, provides an interface to hardware used to control a buffered experiment. It manages a queue of shots to be run as well as providing manual control over devices between shots.
Other
0 stars 0 forks source link

calls to transition_to_manual are serialised by the queue manager #34

Closed philipstarkey closed 5 years ago

philipstarkey commented 6 years ago

Original report (archived issue) by Philip Starkey (Bitbucket: pstarkey, GitHub: pstarkey).


Calls to device.transition_to_manual by the queue manager are currently serialised.

I believe this is an artefact of a version of BIAS which did not use zlock/h5lock and would thus cause HDF5 file corruption if BIAS wrote to the file while the lock was held by one of the BLACS devices. However, I believe this is no longer the case (need to double check) and that BIAS uses zlock/h5lock correctly now.

Regardless, this introduces an unfortunate error when using multiple acquisition cards with a wait monitor. If the device with the wait monitor is not transitioned to manual prior to all acquisition devices, then those acquisition devices will timeout waiting for the zprocess.Event to be issued indicating that waits have been processed (since that processing has not happened yet, and can't until the other acquisition device completes). This is effectively a deadlock, except that fortunately the zprocess event times-out (although then the shot is aborted due to that failure). This of course may work for some people, as the transition order is currently dependent on the iteration order of a dictionary, which is itself dependent on the names of each device in use along with the number of devices in use (see resources on how dictionary hashing works in Python).

I believe @rpanderson recently experienced this bug when he visited JQI.

Anyway, the solution is to deserialise transition_to_manual

philipstarkey commented 6 years ago

Original comment by Russell Anderson (Bitbucket: rpanderson, GitHub: rpanderson).


This is effectively a deadlock, except that fortunately the zprocess event times-out (although then the shot is aborted due to that failure).

This depends on the device class. For the NI-PCIe-6363, the wait_durations_analysed event has no timeout, as the wait method of zprocess.Event does not time out, by default. This means that wait_durations_analysed.wait(self.h5file) hangs indefinitely if the deadlock occurs.

Another example from the spielman fork is the NI-USB-6343.

philipstarkey commented 5 years ago

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


Resolved by pull request 73

philipstarkey commented 5 years ago

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).