intelligent-machine-learning / dlrover

DLRover: An Automatic Distributed Deep Learning System
Other
1.21k stars 151 forks source link

DLRover - Flyte integration #1275

Open davidmirror-ops opened 3 days ago

davidmirror-ops commented 3 days ago

Hello DLRover team:

I'm a member of the LF AI&Data TAC and voted for this project to make it to the sandbox as it's very interesting, especially regarding the optimizations you implement.

I'm also a maintainer of Flyte, a graduated LF AI&Data project and we'd like to discuss with you the scope of an integration where we can collaborate. We see different possible sinergies among the two projects like your K8s approach, integrations with Ray and a Python interface. A DLRover plugin could benefit the Flyte community, getting access to resource-efficient distributed training for LLMs, and the DLRover users with a K8s-native and Python-first orchestrator.

We can start a thread here to coordinate a call or something.

Thanks in advance.

BalaBalaYi commented 3 days ago

First of all, thank you very much for your interest in the DLRover project. We are also very eager to collaborate with other outstanding open-source projects. We still need to discuss internally about the ways and specific details of the collaboration. We will provide further feedback later.