Closed tangkong closed 1 year ago
The most important thing here is actually that devices that don't connect prevent the UI from updating, seemingly forever.
For example, switching to the QRIX line freezes the GUI. The default line right now (XPP) doesn't open while the TXI switch work is ongoing.
It was observed that a single disconnected device could result in a huge number of disconnected device errors, thrown from find_device_state
.
At every point where we ask for a LightRow
widget to update its color, we consider its position in relation to the path as a whole. Because of this, if a disconnected device is the impediment, lightpath will call find_device_state
on it for each device, causing the unexpectedly large number of connection failures.
While this may not be a new problem, it's certainly exacerbated by the switch to considering the whole path for each widget.
Thoughts on possible solutions:
It should be pretty quick to check that a device is disconnected, right? (~device~ lightpath_signal.connected attribute). Can we mark these with an Unknown/Disconnected state and move on? I'm not sure whether unknown devices are supposed to count as blocking or not. Hypothetically these could be Unknown/Disconnected and be marked as... not blocking? (Until they connect and we can get a real update?)
You're right in that checking the connection status explicitly should be fast. We just catch an exception, rather than actually checking the connection status. This leads to a delay that is made worse by the timeout setting I mentioned earlier, if I remove that we get a the errors come in a lot faster.
Maybe we just add a connection check to find_device_state
before trying to get the value. That does sound easiest.
We currently mark unknown devices as blocking, and I think I'm happy with that behavior. I suppose the disconnected status could be obvious enough with the new coloring.
Expected Behavior
We should be able to configure the EpicsSignal default timeout at startup, or rely on an environment variable
Current Behavior
Currently we set the default EpicsSignal timeout to 10s, which worked around various
TimeoutErrors
andpyepics failed to send connection and access rights
. This works fine when all the devices are online, but devices are actually disconnected, this can cause large slowdowns when starting up lightpath (or when switching beamlines).Furthermore, switching to a beamline with disconnected devices seems to freeze the ui, as all disconnected devices time out at one of these function (
find_device_state
) calls in series before any device is shown in lightpath.https://github.com/pcdshub/lightpath/blob/05a9394e09acf256649738f6233d54a7ba2b1b4d/lightpath/path.py#L83
Possible Solution
Look for the timeout setting in an env variable, or add a cli argument to specify it.
Also handle disconnected devices more gracefully.
Context
This was first encountered in #142, and brought up again in #150 while testing during a mini-shutdown against a switch that was powered off, disabling a large number of devices.
Your Environment
During testing of #150