mars-project / mars

Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
https://mars-project.readthedocs.io
Apache License 2.0
2.7k stars 326 forks source link

Conventional schedulers (Slurm/PBS/Loadleveler) compatibility? #97

Open surak opened 5 years ago

surak commented 5 years ago

Is your feature request related to a problem? Please describe. Most supercomputers in the world use of the few schedulers available, like the ones mentioned at the title. Those usually don't play well with other schedulers.

Describe the solution you'd like To be able to run it directly from a slurm session.

Describe alternatives you've considered In many SPMD environments, one is able to submit one single program to run in a number of compute nodes. So, this program should be able to run as a master in one node, and as a worker in all the other nodes. Something like a front-end to mars.

Additional context Supercomputer schedulers are quite simple in operation. The user submits a job to the batch system, which waits until the amount of resources requested is available. Then, it runs the code in all the processes. It's that simple.

qinxuye commented 5 years ago

As current developers of Mars don't have such background or experience on supercomputers. We may need the help from the community. Actually It will be fantastic if you may try to deploy Mars and it's runtime to such an environment, with contributing back the code, we'll appreciate that so much.

raybellwaves commented 5 years ago

Worth taking a look at https://github.com/dask/dask-jobqueue

chaokunyang commented 2 years ago

Can we add other scheduler like Slurm here using execution API? @fyrestone

fyrestone commented 2 years ago

Can we add other scheduler like Slurm here using execution API? @fyrestone

It seems that this is a deployment problem, not an execution issue.