Open BrianJKoopman opened 1 month ago
Thanks for this. The correct behavior is probably to catch this in the monitor_state
process and mark it as degraded... and also raise a flag to make sure none of the spin-up commands can run.
I think it might make sense to move the safety check logic from the control-update function into properties of the HWPState object, such as spin_up_safe
and grip_safe
that check internal state variables like this and return a bool. (I don't think UPS state is currently checked anywhere before)
I was helping satp2 try to recover their HWP system this morning and found the supervisor agent in this state:
It seems like it wasn't able to connect to any of the clients so when
monitor
goes to grab state info it hits thisraise
, which it doesn't handle: https://github.com/simonsobs/socs/blob/33b1e1d82d367829a9273222d374a0219b151801/socs/agents/hwp_supervisor/agent.py#L442EDIT: This was on socs image:
v0.5.1-22-g7d2f158-dev