kubeflow / training-operator

Distributed ML Training and Fine-Tuning on Kubernetes
https://www.kubeflow.org/docs/components/training
Apache License 2.0
1.52k stars 661 forks source link

Add more AI/ML Training Examples #2040

Open andreyvelich opened 3 months ago

andreyvelich commented 3 months ago

As we discussed previously: https://github.com/kubeflow/training-operator/pull/2021#issuecomment-1987733922 we want to add more AI/ML examples to the Kubeflow Training Operator. Right now, most of our examples have very basic and simple CNN training for MNIST. Since Training Operator is capable to train large-scale ML models, we would like to contribute more AI/ML use-cases.

We can make these examples Data Scientists friendly and re-use our Python SDK within Jupyter Notebooks to simplify the user submission. I like the example structure of HF Transformers, so I propose the following path: examples/<framework>/<ml-use-case>

We can start with these examples (feel free to add more ML use-cases in this issue):

We should investigate how to configure our CI/CD to make sure that these examples are functional.

cc @kuizhiqing @johnugeorge @tenzen-y @kubeflow/wg-training-leads

/help /good-first-issue /area example

google-oss-prow[bot] commented 3 months ago

@andreyvelich: This request has been marked as suitable for new contributors.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-good-first-issue command.

In response to [this](https://github.com/kubeflow/training-operator/issues/2040): >As we discussed previously: https://github.com/kubeflow/training-operator/pull/2021#issuecomment-1987733922 we want to add more AI/ML examples to the Kubeflow Training Operator. Right now, most of our examples have very basic and simple CNN training for MNIST. Since Training Operator is capable to train large-scale ML models, we would like to contribute more AI/ML use-cases. > >We can make these examples Data Scientists friendly and re-use our Python SDK within Jupyter Notebooks to simplify the user submission. >I like the example structure of [HF Transformers](https://github.com/huggingface/transformers/tree/main/examples), so I propose the following path: `examples//` > >We can start with these examples (feel free to add more ML use-cases in this issue): > >- [x] Language Modeling >- [x] Image Classification >- [x] Text Classification >- [ ] Audio Classification >- [ ] Question Answering >- [ ] Speech Recognition >- [ ] Text Generation > > >**We should investigate how to configure our CI/CD to make sure that these examples are functional.** > >cc @kuizhiqing @johnugeorge @tenzen-y > >/help >/good-first-issue >/area example Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
xr-dev-saurabh commented 3 months ago

/assign

StefanoFioravanzo commented 2 months ago

@andreyvelich I love this. Few thoughts: