simonsobs / socs

Simons Observatory specific OCS agents.
BSD 2-Clause "Simplified" License
12 stars 12 forks source link

ACTi Camera network timeout #682

Open BrianJKoopman opened 1 month ago

BrianJKoopman commented 1 month ago

There was a network timeout today that caused the ACTi Camera Agent to stop collecting screenshots.

2024-05-29T14:33:09Z    2024-05-29T14:33:09+0000 Grabbing screenshot from c3_front
2024-05-29T14:33:10Z    2024-05-29T14:33:10+0000 Grabbing screenshot from c2_front
2024-05-29T14:33:10Z    2024-05-29T14:33:10+0000 Grabbing screenshot from act_highbay
2024-05-29T14:33:11Z    2024-05-29T14:33:11+0000 Grabbing screenshot from c4_front
2024-05-29T14:33:11Z    2024-05-29T14:33:11+0000 Grabbing screenshot from highbay_back
2024-05-29T14:34:07Z    2024-05-29T14:34:07+0000 Grabbing screenshot from highbay_cargo
2024-05-29T14:34:08Z    2024-05-29T14:34:08+0000 Grabbing screenshot from site_entrance
2024-05-29T14:34:09Z    2024-05-29T14:34:09+0000 Grabbing screenshot from pumphouse
2024-05-29T14:34:14Z    2024-05-29T14:34:14+0000 acq:1 CRASH: [Failure instance: Traceback: <class 'urllib3.exceptions.ReadTimeoutError'>: HTTPConnectionPool(host='192.168.65.37', port=80): Read timed out.
2024-05-29T14:34:14Z    /usr/lib/python3.8/threading.py:932:_bootstrap_inner
2024-05-29T14:34:14Z    /usr/lib/python3.8/threading.py:870:run
2024-05-29T14:34:14Z    /usr/local/lib/python3.8/dist-packages/twisted/_threads/_threadworker.py:49:work
2024-05-29T14:34:14Z    /usr/local/lib/python3.8/dist-packages/twisted/_threads/_team.py:192:doWork
2024-05-29T14:34:14Z    --- <exception caught here> ---
2024-05-29T14:34:14Z    /usr/local/lib/python3.8/dist-packages/twisted/python/threadpool.py:269:inContext
2024-05-29T14:34:14Z    /usr/local/lib/python3.8/dist-packages/twisted/python/threadpool.py:285:<lambda>
2024-05-29T14:34:14Z    /usr/local/lib/python3.8/dist-packages/twisted/python/context.py:117:callWithContext
2024-05-29T14:34:14Z    /usr/local/lib/python3.8/dist-packages/twisted/python/context.py:82:callWithContext
2024-05-29T14:34:14Z    /usr/local/lib/python3.8/dist-packages/ocs/ocs_agent.py:984:_running_wrapper
2024-05-29T14:34:14Z    /usr/local/lib/python3.8/dist-packages/socs/agents/acti_camera/agent.py:137:acq
2024-05-29T14:34:14Z    /usr/lib/python3.8/shutil.py:205:copyfileobj
2024-05-29T14:34:14Z    /usr/local/lib/python3.8/dist-packages/urllib3/response.py:544:read
2024-05-29T14:34:14Z    /usr/lib/python3.8/contextlib.py:131:__exit__
2024-05-29T14:34:14Z    /usr/local/lib/python3.8/dist-packages/urllib3/response.py:446:_error_catcher
2024-05-29T14:34:14Z    ]
2024-05-29T14:34:14Z    2024-05-29T14:34:14+0000 acq:1 Status is now "done".

We should catch the timeout error and just continue to the next iteration of the loop, waiting for the network issues to resolve but keeping the agent online.