Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
2.68k
stars
325
forks
source link
[BUG] Fix the `GPU DEVICE NOT FOUND` error when deploying Mars on Ray #3333
Closed
dlee992 closed 1 year ago
What do these changes do?
Mars itself can run with GPU resources, but Mars on Ray or Ray-DAG is infeasible.
This PR will fix the latter issue by handling following parts:
GPU
related arguments when starting Mars sessionworker_gpus
argument to specify ...supervisor_gpus
argument to specify ...GPU
resources when creatingRayMainPool
GPU
resources when creatingRaySubPool
GPU
resources when executing subtask graph on the mode ofMars-on-Ray-DAG
GPU
resources when creating Ray'splacement group
Related issue number
Fixes #3334.