mspass-team / mspass

Massive Parallel Analysis System for Seismologists
https://mspass.org
BSD 3-Clause "New" or "Revised" License
30 stars 12 forks source link

cluster configuration for single node runs #495

Open pavlis opened 8 months ago

pavlis commented 8 months ago

I am testing a workflow on an hpc cluster at Indiana University. I am verifying everything works correctly with a small subset of a much larger dataset that I am estimating would take 2000 minutes to run serial. I thought I could run the workflow with the simpler all-in-one mode of the container as a interactive job. Doing so, however, I notice with top running in a terminal in jupyter lab that the jobs seems to only be using one worker process/thread. I am running with the default interactive job setup at IU for this system which is 8 cores and one node (note this machine has more than 8 cores allowing multiple interactive jobs to run on the same set of nodes). This particular machine has enough cores (64) that running dask with one process per node is sufficient to make the run time reasonable (a few hours). How do I do that? I posted this as an issue rather than discuss it with the development team as it is worth preserving this answer. It would be a common need today with 64+ core machines. The answer should also go into the updates to the user manual I hope we will get up soon.