Closed dpdutcher closed 2 years ago
@dpdutcher what was the solution for this?
The interim solution was to run S._caput('smurf_server_s2:AMCc:SmurfProcessor:SOStream:BuilderEncode', 1)
for each slot, and the longer term solution was Jack set that to be the default in an update to the smurf-streamer. v0.4.0 certainly has the fix.
Describe the bug
In Princeton testing of the SO detector modules, operating three slots simultaneously causes epics to crash on one of the boards, and that board needs to be hammered to become operational again.
Here, "operating" means taking either IV curves or bias-step data using sodetlib functions. This problem has not occurred when operating two slots simultaneously, but regularly occurs when operating three. The crash is not immediate, sometimes happening after one hour, two hours, or 12 hours. It is not always the same slot that crashes. The other two slots remain operational while the third one has crashed.
Additional details
RuntimeError: epics failed to respond
.Machine is smurf-srv15. Relevant log files for such a crash that happened today are:
/data/cores/core_1647369167_python3_3917_4508_0_0
/data/logs/16473/1647374577
Beginning of traceback from ocs-pysmurf docker: