Open andreyvelich opened 3 months ago
/assign
I can help with this. Please let me know if you have different plans @kubeflow/wg-training-leads .
Thank you, Shao! However, we need to work on the LLM Trainer before we add the post-training runtimes: https://github.com/kubeflow/training-operator/issues/2321
Thanks for pointing this out, Andrey!
Shall I unassign myself since this issue is related to #2321 ?
If you could also help us with #2321 that would be great! We have a few ideas with @saileshd1402, but we still investigate on how we can build that Trainer to support different LLMs and datasets.
Sure, I'm glad to hear that I can help with #2321 !
Related: https://github.com/kubeflow/training-operator/issues/2170
Once we implement storage initializers, trainers, and controllers, we should add the LLM training runtimes. We can start with runtime for Llama 3.1 8B.
https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct
/area runtime