Closed mwojcikowski closed 9 years ago
Maciej; Thanks for testing this out and sorry about the delay issues. Within ipython-cluster-helper we check at intervals of 30 seconds (unless spinning up a non-distributed test cluster):
so I believe the delay comes from within IPython parallel itself and don't personally know of parameters to tweak to make it faster than it is. It might be worth asking for advice for the IPython folks. We're happy to try out new parameters here if you can identify things to fix. Sorry to not have something practical to try but hope this helps.
After lowering both values to 5 sec (5000 and 5), the setup time is around 30sec, which is acceptable. By setup time I mean the delay between the ipcontroller start to the start of computations.
I'm not saying these are the 'good' values, although it might be wise to present them to the user in a adjustable manner. Anyhow how costly are those checks to do them that rarely?
Maciej; Thanks for testing this and the suggestions. I pushed a fix which increases the pings and also concurrently allows more missed pings so we can continue to handle interruptable queues (the goal of the long timeout) without limiting startup time. If this works well for you in practice we can roll a new release with these fixes. Thanks again for all the helpful feedback.
You're welcome. I like the tool very much and intend to make some further contributions if possible. Out the top of my head:
res = Parallel(n_jobs=8)(delayed(long_func)(arg for arg in args))
Maciej; Thanks, we're always open to ideas to have better abstractions and way of using this. Within our group, we primarily use ipython-cluster-helper in bcbio, so some of the abstractions there might be helpful on what to do (or not to do, depending on your opinions). We wrap around joblib and ipython-cluster-helper to provide one way to run code locally or remotely:
https://github.com/chapmanb/bcbio-nextgen/blob/master/bcbio/distributed/prun.py https://github.com/chapmanb/bcbio-nextgen/blob/master/bcbio/distributed/ipythontasks.py
Having more lightweight ways of accomplishing this in ipython-cluster-helper would be welcome.
I'm currently looking for a way to tweak the ipython-cluster-helper, so that it became available quicker. My cluster is not fully utilized, meaning you get all the resources you ask instantaneously. Currently setting up the cluster takes ~ 180 sec, but I can see that all the engines are ready after just around 20 sec, that includes the delayed startup of ipengines.
I tried to change heartbeat period 5000 ms (https://github.com/roryk/ipython-cluster-helper/blob/master/cluster_helper/cluster.py#L57), although that resulted in a ~120sec startup. Is that expected or maybe is there something wrong with setup?