Open tristpinsm opened 2 months ago
I think we may want to test this... PVs in the smurf_command module use a 10 second timeout, so a retry happens roughly every 10 sec. We may want to implement the same thing here; as things are now I believe retries will happen immediately which I don't think we want.
OK, I've added a timeout between restarts and I can try it out on a test system at SLAC this afternoon. Just to confirm that it is retrying as expected when it can't reach the cryo card. However I'm not sure how to determine the appropriate number of retries or time in between. Currently it's set to retry 5 times, at 5s intervals.
Alright so I've tested CryoCard.read_temperature
with no cryo card connected and it's retrying as it should:
[ 2024-05-17 19:25:19 ] CryoCard.do_read waiting 5s before retry 2 / 5.
[ 2024-05-17 19:25:24 ] CryoCard.do_read waiting 5s before retry 3 / 5.
[ 2024-05-17 19:25:29 ] CryoCard.do_read waiting 5s before retry 4 / 5.
[ 2024-05-17 19:25:34 ] CryoCard.do_read waiting 5s before retry 5 / 5.
However, after those attempts it returns a nonsense value, which is just how the code is designed.
I'm confused how this is working, shouldn't the continue
bypass the sleep and log? Maybe this is an instance where it does not timeout but the returned address does not match.
I think instead of manually sleeping, it makes more sense to set the timeout in the epics get fn, like here: https://github.com/slaclab/pysmurf/blob/1ff66325bf174f975c241b5cf6922f378dd58b20/python/pysmurf/client/command/smurf_command.py#L208-L211 5 sec timeout with 5 retries sounds good to me as a default!
I'm confused how this is working, shouldn't the
continue
bypass the sleep and log? Maybe this is an instance where it does not timeout but the returned address does not match.
yeah, that's a mistake...
I think instead of manually sleeping, it makes more sense to set the timeout in the epics get fn, like here:
5 sec timeout with 5 retries sounds good to me as a default!
ok
This is looking good to me... one more thing though, I think we should raise an exception on failure instead of just returning 0. Can we add that here?
This is looking good to me... one more thing though, I think we should raise an exception on failure instead of just returning 0. Can we add that here?
Yeah, I agree that makes a lot more sense. I wonder if that change may break other code that is relying on the current behaviour though. I guess I can have look and see.
Description
Addresses issue #792
Jira Issue
N/A
Tests done on this branch
It's a simple enough fix that that testing didn't seem necessary.
Function interfaces that changed
N/A
What was the interface before the change
N/A
What will be the new interface after the change
N/A