simonsobs / socs

Simons Observatory specific OCS agents.
BSD 2-Clause "Simplified" License
12 stars 12 forks source link

HWP PCU agent minor modification #599

Closed 17-sugiyama closed 9 months ago

17-sugiyama commented 9 months ago

I tested the recently added hwp_pcu agent, and found that 3 seconds of timeout is sometimes too short to communicate with the HWP PCU module. This is a request to extend timeout for sending commands/receiving messages.

Description

I extended the timeout for send_command from 3 sec to 10 sec; and for get_status to 10 sec.

Motivation and Context

Improving the operation stability

How Has This Been Tested?

I tested this agent in the lab and it worked. The recently added agent worked fine except for the timeout issue.

Types of changes

Checklist:

jlashner commented 9 months ago

@BrianJKoopman a bit confused about your comment (I think it was me who suggested non-zero timeouts). Why would a timeout of zero help here?

jlashner commented 9 months ago

I think the problem is the long acq loop time: https://github.com/simonsobs/socs/blob/aaf066e974c0eaea3aff420ae9eb077977174213/socs/agents/hwp_pcu/agent.py#L193

While this is sleeping, the lock cannot be acquired by other processes. I think to fix we should make this a fast loop (.1 sec or so) and only take and publish data on iterations where 5 sec have elapsed.

Or we could also restructure the agent to avoid TimeoutLocks.

BrianJKoopman commented 9 months ago

I think the problem is the long acq loop time:

https://github.com/simonsobs/socs/blob/aaf066e974c0eaea3aff420ae9eb077977174213/socs/agents/hwp_pcu/agent.py#L193

While this is sleeping, the lock cannot be acquired by other processes. I think to fix we should make this a fast loop (.1 sec or so) and only take and publish data on iterations where 5 sec have elapsed.

Ah, yup, agreed this 5 second sleep is the issue, the task timeouts would need to be longer than that. Sorry for the noise. I've just seen non-zero acq timeouts cause issues in the past, and the example in the docs has a 0 timeout in long running processes.

jlashner commented 9 months ago

I realize it might not be clear what I'm talking about when I say lockless agent restructure, so I threw together what I have in mind in this draft PR for reference which should fix the locking issues. Feel free to use that fix if you would like, but may need some testing:

https://github.com/simonsobs/socs/pull/600

17-sugiyama commented 9 months ago

Thank you so much for your comments and suggestions. I prefer to use Jack's lockless agent. I tested Jack's agent with the PCU instrument and it worked. I left some comments to #600. I'd like to close this pull request.