ml-tooling / ml-hub

🧰 Multi-user development platform for machine learning teams. Simple to setup within minutes.
Apache License 2.0
301 stars 64 forks source link

Support GPUs on multiple machines (via docker-swarm or kubernetes)? #19

Open Ledenel opened 4 years ago

Ledenel commented 4 years ago

Feature description:

Support docker-swarm (with GPUs support) out-of-the-box.

Problem and motivation:

As here describes, CURRENTLY it is not possible to run ml-hub with GPU support across multiple machines (while every machine may have one or more GPU cards). Since it is not easy to build a kubernetes cluster with GPU support and management (and I'm not farmiliar with kubernetes), maybe a more lightweight solution (like docker-swarm?) would support it more seamlessly (via nvidia-docker).

Is this something you're interested in working on?

Yes

Ledenel commented 4 years ago

By the way, kubernetes seems to support GPU management via Device plugins https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#using-device-plugins. So why the gpu mode is not supported in kubernetes? Is it due to lack of standards, historical reasons, or just waiting someone to implement?