Open philipstarkey opened 7 years ago
Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).
Although it's very hard to tell the difference between a very bad error and a not so bad error without the programmer anticipating each one and flagging it as such, we could make an option like "if encountering an error, restart the tab in question and retry the shot", and then something like "if there are still errors, go into repeat mode with a default shot" (the default shot likely obtained by a request to runmanager to please submit a shot with all the default values, once the "default values" feature is implemented there).
There could be increasingly aggressive recovery attempts each time one fails - first restart the offending tab. Then restart all tabs. Then do the "reset of hardware" functionality you mentioned in another feature request. Then even if there are errors during transition_to_static, keep running anyway to keep the experiment cycling. If there are persistent errors during transition_to_buffered, and restarting all tabs and doing hardware resets doesn't fix it, then there is no recourse left and BLACS will have to stop.
All the errors would have to be logged and the GUI prominently display that something went wrong earlier even if recovery was possible.
Original report (archived issue) by Ian B. Spielman (Bitbucket: Ian Spielman).
Currently blacs stops on all error conditions. This is bad behavior. Our system requires that it be running constantly to stay in a stable "warm" configuration. So blacs should not stop submitting shots unless and "end of the world" bad event has occurred.
There are different degrees of badness. For example, in some cases a camera will miss an image for some reason. This type of error should just request blacs to ignore the error (or perhaps re-try the shot).
Blacs should switch to a "safe" script of too many errors accumulate.