Closed mhesselbarth closed 4 years ago
I tried to increase all timeout settings, but still getting the error that the workers reach a timeout and are terminated.
Example from my log file:
Erron in clustermq:::worker("tcp://gwdu103:8922")
Timeout reached, terminating
The documentation is in this PR, but I haven't deployed the update on the web page yet (because it's not yet released).
What does your SSH log say?
Note that clustermq.ssh.timeout
is for SSH startup, while the worker timeout is likely during runtime.
Are you transferring large amounts of data over SSH? This could be one reason. Or, if your SSH gets disconnected altogether (which may be solvable by changing the default timeouts).
Thank you very much for your help.
I think the reason was that the data transfered over SSH was too large (about 1 GB).
@mschubert If the data being transferred via SSH are larger, lets say 1GB+, is there a way to increase the worker timeout? I never have an issue with the SSH startup, but sometimes the data I'm sending is on the larger side, and my workers time out.
Other than not sending large data over SSH, any suggestions for how to work around any timeout issues?
@mattwarkentin You can set clustermq.worker.timeout
.
I still need to document the options better
My attempt in #218
I had some troubles with the SSH connection lately and found that there might be an option (clustermq.ssh.timeout) available. However, I couldn't find any documentation on how and where to set it. I guess the local .Rprofile should do the trick?
Any help would be highly appreciated.