simonsobs / socs

Simons Observatory specific OCS agents.
BSD 2-Clause "Simplified" License
12 stars 13 forks source link

Make Agents robust to connection dropouts #721

Open BrianJKoopman opened 3 months ago

BrianJKoopman commented 3 months ago

Related to discussion https://github.com/simonsobs/socs/discussions/538, we need to make sure agents are robust against connection dropouts, whether that's a network interruption, serial connection dropout, or otherwise.

I'm making a single issue for this to avoid spamming 50+ individual issues. If you would like to contribute, read on!

Contributing

If you would like to work on "robustifying" an agent, please create an issue with a title like "Make \<AGENT NAME> robust to connection dropouts" and assign yourself. This will help us keep track of which agents are actively being worked on, and which are up for grabs. I will link relevant issues/PRs in the list below.

When you're ready, PR your code changes and link the associated issue. Once merged I'll update the checklist below.

The focus of these change should be on the "main" processes within each agent, as these are most impacted by lack of connection related error handling. Bonus points if you update tasks to handle errors.

(If you know any of the agents to already be robust, please comment here.)

Robust Agents Checklist

Lab-only Agents (lower priority)

BrianJKoopman commented 3 months ago

It sounds like there might be some hardware related issues contributing, but the HWPPIDAgent is one of the commonly crashing agents, and a good one to take a look at next for anyone interested. See https://github.com/simonsobs/chwp-discussions/discussions/21.

davidvng commented 2 months ago

@BrianJKoopman the ibootbarAgent and the UPSAgent should already robust to connection dropouts