roryk / ipython-cluster-helper

Tool to easily start up an IPython cluster on different schedulers.
148 stars 23 forks source link

Consider providing a way to relax timeout. #16

Closed porterjamesj closed 8 years ago

porterjamesj commented 10 years ago

When I start up a reasonably large cluster (~300-500 ipengines), my nodes often take forever to come up. The 60 second timeout makes this an error, but if I manually relax the timeout everything works fine (eventually). Maybe we could provide a way to do this in the API and expose it from bcbio-nextgen? I would do it myself but I'm not sure what y'all would want the interface to look like; I'm also not caught up with all the latest changes to the bcbio-nextgen parallel code.

roryk commented 10 years ago

Hi James,

Thanks for the note, hope you are well. You can set the timeout for a set of engines coming up with bcbio-nextgen with the --timeout parameter, in minutes. That timeout parameter in ipython-cluster-helper is the time the engine should wait for the controller to respond when it getting registered, we had to bump that up from the defaults in IPython. Are you having more of the first or the second type of issue?

porterjamesj commented 10 years ago

The second. I'm having issues with the timeout that's hardcoded to 60 seconds here, which is not the one you can adjust with the --timeout parameter.

roryk commented 10 years ago

Hi James,

Great-- I think we can just bump that value up to something higher and it should work fine. What are you using as a value?

porterjamesj commented 10 years ago

I'm still fooling around with it actually; what's necessary seems to vary based how big my cluster is. I would like to try to diagnose the root cause of the long registration delay so maybe let's leave it alone for now and I'll get back to you once I have some more information.

roryk commented 10 years ago

Ok, thanks James-- let us know what you find out, it will be helpful for us because I am sure it will come up in the future. Debugging performance issues on a cluster is a huge pain. Happy Monday!

roryk commented 8 years ago

Closing this out since it is pretty stale. Feel free to reopen if you ended up having some suggestions. Thanks again, hope everything is well James.