premAI-io / prem-operator

📡 Deploy AI models and apps to Kubernetes without developing a hernia
https://premai.io?utm_source=prem-operator
Apache License 2.0
17 stars 2 forks source link

Remove dependency on internal DeepSpeed container #1

Open richiejp opened 3 months ago

richiejp commented 3 months ago

We could:

  1. Open source the containers repo
  2. Contribute changes to the upstream DeepSpeed project
  3. ???
richiejp commented 3 months ago

I have two patches for DeepSpeed. One allowed MIG to work and the other just adds a health check. The MIG one looks difficult to get upstreamed. The health check could be upstreamed easily if they accept it or dropped in favor of just checking the gRPC backend.

https://github.com/microsoft/DeepSpeed-MII/pull/445

richiejp commented 3 months ago

There's actually no suitable DeepSpeed-MII container AFAIK. So I'm moving closer to the idea of 1., but probably include the docker image inside the operator repo because there is other stuff inside the container repo and it would be nice to have at least the open source stuff in a mono-repo

richiejp commented 3 months ago

Also no response on the PR, I'm moving to the post release milestone and will just make the container public.

richiejp commented 2 months ago

actually I may just bring the containers repo into this one.