simonsobs / sodetlib

Tools for performing core instrument testing, quality control, and analysis tasks.
BSD 2-Clause "Simplified" License
5 stars 0 forks source link

Not enough time after loading defaults before trying to query SmurfVersion #355

Open samdayweiss opened 1 year ago

samdayweiss commented 1 year ago

This is @jlashner posting from Sam's account.

Seems like when running setup, there is not enough time after setting defaults before trying to query the SmurfVersion. In the logs below, Smurf version needs to be queried 5 times before it returns a value, and this has recently caused the setup to crash entirely (as seen here: https://github.com/simonsobs/sodetlib/discussions/353#discussion-5290857). After setup fails, though I'm able to caget the SmurfVersion so I think it's just a matter of waiting longer, or setting a longer timeout for this one particular caget call.

[ 2023-06-14 15:51:40 ]  caput smurf_server_s2:AMCc:FpgaTopLevel:AppTop:AppCore:MicrowaveMuxCore[1]:DBG:dacReset[1] 0
[ 2023-06-14 15:51:40 ]  caput smurf_server_s2:AMCc:ReadAll 1
[ 2023-06-14 15:51:44 ]  Waiting 20.00 seconds after...
[ 2023-06-14 15:52:04 ]  Done waiting.
[ 2023-06-14 15:52:04 ]  caget smurf_server_s2:AMCc:SmurfApplication:SmurfVersion
[ 2023-06-14 15:52:04 ]  0+unknown
[ 2023-06-14 15:52:04 ]  caput smurf_server_s2:AMCc:setDefaults 1
[ 2023-06-14 15:52:34 ]  Waiting 30.00 seconds after...
CA.Client.Exception...............................................
    Warning: "Virtual circuit unresponsive"
    Context: "localhost:34607"
    Source File: ../tcpiiu.cpp line 919
    Current Time: Wed Jun 14 2023 15:52:39.569473338
..................................................................
[ 2023-06-14 15:53:04 ]  Done waiting.
[ 2023-06-14 15:53:04 ]  caget smurf_server_s2:AMCc:SmurfApplication:SmurfVersion
/usr/local/lib/python3.8/dist-packages/epics/ca.py:1528: UserWarning: ca.get('smurf_server_s2:AMCc:SmurfApplication:SmurfVersion') timed out after 1.95 seconds.
  warnings.warn(msg % (name(chid), timeout))
[ 2023-06-14 15:53:06 ]  Command failed: smurf_server_s2:AMCc:SmurfApplication:SmurfVersion
[ 2023-06-14 15:53:06 ]  Retry attempt 1 of 5
[ 2023-06-14 15:53:08 ]  Retry attempt 2 of 5
[ 2023-06-14 15:53:10 ]  Retry attempt 3 of 5
[ 2023-06-14 15:53:12 ]  Retry attempt 4 of 5
[ 2023-06-14 15:53:14 ]  Retry attempt 5 of 5
[ 2023-06-14 15:53:15 ]  0+unknown
[ 2023-06-14 15:53:15 ]  The `SmurfApplication:JesdStatus` register is not implemented for pysmurf core code versions <4.1.0 (current version is 0).
jlashner commented 1 year ago

Looking at old issues like this one: https://github.com/simonsobs/sodetlib/issues/273, it seems like the timeout for caget retries used to be 5 seconds, but now it looks like its 2 seconds, which could very well be the cause of many epics issues seen recently. @swh76 do you have any idea what may have caused this? It is possible that we were using the default timeout in pyepics and that changed in some version update.