Open BrianJKoopman opened 3 months ago
It sounds like there might be some hardware related issues contributing, but the HWPPIDAgent
is one of the commonly crashing agents, and a good one to take a look at next for anyone interested. See https://github.com/simonsobs/chwp-discussions/discussions/21.
@BrianJKoopman the ibootbarAgent
and the UPSAgent
should already robust to connection dropouts
Related to discussion https://github.com/simonsobs/socs/discussions/538, we need to make sure agents are robust against connection dropouts, whether that's a network interruption, serial connection dropout, or otherwise.
I'm making a single issue for this to avoid spamming 50+ individual issues. If you would like to contribute, read on!
Contributing
If you would like to work on "robustifying" an agent, please create an issue with a title like "Make \<AGENT NAME> robust to connection dropouts" and assign yourself. This will help us keep track of which agents are actively being worked on, and which are up for grabs. I will link relevant issues/PRs in the list below.
When you're ready, PR your code changes and link the associated issue. Once merged I'll update the checklist below.
The focus of these change should be on the "main" processes within each agent, as these are most impacted by lack of connection related error handling. Bonus points if you update tasks to handle errors.
(If you know any of the agents to already be robust, please comment here.)
Robust Agents Checklist
Lab-only Agents (lower priority)