simonsobs / socs

Simons Observatory specific OCS agents.
BSD 2-Clause "Simplified" License
12 stars 12 forks source link

HWP-PID: ValueError when multiple values are returned because of timeout #651

Closed ykyohei closed 2 months ago

ykyohei commented 3 months ago

Error log of satp3

2024-03-27T12:50:31+0000 main:0 CRASH: [Failure instance: Traceback: <class 'ValueError'>: could not convert string to float: '0.001\rX010.001' 
2024-03-27T12:50:31+0000 ['X010.001\rX010.001'] 
2024-03-27T12:50:30+0000 Caught timeout waiting for response from PID controller. Trying again... 
2024-03-27T12:50:27+0000 Finding CHWP Frequency 
2024-03-27T12:50:24+0000 Direction = Forward 
2024-03-27T12:50:23+0000 Finding CHWP Direction 
2024-03-27T12:50:23+0000 Setpoint = 0.0 
2024-03-27T12:50:23+0000 ['R01400000'] 
2024-03-27T12:50:23+0000 Finding target CHWP Frequency 
2024-03-27T12:50:23+0000 ['X010.001'] 
2024-03-27T12:50:22+0000 Finding CHWP Frequency
ykyohei commented 3 months ago

Another type of crash

2024-03-27T17:12:03+0000 main:0 CRASH: [Failure instance: Traceback: <class 'RuntimeError'>: Could not connect to PID controller
2024-03-27T17:12:03+0000 Failed to connect to device at 192.168.13.33:2001
2024-03-27T17:12:03+0000 Failed to connect to device at 192.168.13.33:2001
2024-03-27T17:12:03+0000 Failed to connect to device at 192.168.13.33:2001
2024-03-27T17:12:01+0000 Resetting connection
2024-03-27T17:12:00+0000 Caught timeout waiting for response from PID controller. Trying again...
2024-03-27T17:11:56+0000 Caught timeout waiting for response from PID controller. Trying again...
2024-03-27T17:11:54+0000 Finding CHWP Frequency
2024-03-27T17:11:50+0000 Direction = Forward
2024-03-27T17:11:49+0000 Finding CHWP Direction
ykyohei commented 3 months ago

Another type of crash..

2024-03-27T23:07:33+0000 main:0 CRASH: [Failure instance: Traceback: <class 'ValueError'>: invalid literal for int() with base 16: '00000\rR02400000'
2024-03-27T23:07:31+0000 Caught timeout waiting for response from PID controller. Trying again...
2024-03-27T23:07:29+0000 Finding CHWP Direction
2024-03-27T23:07:29+0000 Setpoint = 0.0
2024-03-27T23:07:29+0000 ['R01400000']
2024-03-27T23:07:28+0000 Finding target CHWP Frequency
2024-03-27T23:07:28+0000 ['X010.001']
2024-03-27T23:07:28+0000 Finding CHWP Frequency
ykyohei commented 2 months ago

Another example

2024-04-29T04:21:53+0000 main:0 CRASH: [Failure instance: Traceback: <class 'ValueError'>: invalid literal for int() with base 16: '007D0\rR014007D0'
2024-04-29T04:21:52+0000 Caught timeout waiting for response from PID controller. Trying again...
2024-04-29T04:21:49+0000 Finding target CHWP Frequency
2024-04-29T04:21:49+0000 ['X012.000']
2024-04-29T04:21:49+0000 Finding CHWP Frequency
2024-04-29T04:21:45+0000 Direction = Forward
2024-04-29T04:21:45+0000 Finding CHWP Direction
2024-04-29T04:21:45+0000 Setpoint = 2.0
2024-04-29T04:21:45+0000 ['R014007D0']
jlashner commented 2 months ago

Brian recently responded to a similar issue in a different agent here: https://github.com/simonsobs/daq-discussions/discussions/75#discussioncomment-9228784

I think we may want to do something like check for the \r character inside the received message, and if its there, either take the last segment as the incoming data (since that is the newest one relating to the most recent query), or re-query the device.