kubeflow / training-operator

Distributed ML Training and Fine-Tuning on Kubernetes
https://www.kubeflow.org/docs/components/training
Apache License 2.0
1.61k stars 700 forks source link

Fine-Tune APIs for LLM Documentation #2013

Closed StefanoFioravanzo closed 5 months ago

StefanoFioravanzo commented 8 months ago

This issue tracks the Kubeflow 1.9 documentation deliverables for the new Fine-Tune APIs for LLMs.

cc @kubeflow/wg-training-leads @deepanker13 cc release team docs leads @diegolovison @hbelmiro

StefanoFioravanzo commented 8 months ago

This feature is based on this work:

do we have supporting documentation for TFJob and PyTorchJob Function APIs, should we track it separately of include it in the architecture/API doc above?

deepanker13 commented 8 months ago

@StefanoFioravanzo I can help with the tutorial. Also do you have any reference for api documentation?

andreyvelich commented 7 months ago

Example Notebooks can be found here:

StefanoFioravanzo commented 6 months ago

@andreyvelich @deepanker13 are we writing a tutorial based on these APIs eventually?

andreyvelich commented 6 months ago

@andreyvelich @deepanker13 are we writing a tutorial based on these APIs eventually?

We already have this Notebook to try out this feature: https://github.com/kubeflow/training-operator/blob/master/examples/pytorch/text-classification/Fine-Tune-BERT-LLM.ipynb I think, initially we can just link this Notebook in the website.

StefanoFioravanzo commented 6 months ago

Ok! Where do you suggest we link it?

andreyvelich commented 6 months ago

Ok! Where do you suggest we link it?

I already linked it here: https://deploy-preview-3718--competent-brattain-de2d6d.netlify.app/docs/components/training/user-guides/fine-tuning/#next-steps

hbelmiro commented 5 months ago

@StefanoFioravanzo is there any pending work to close this issue?

andreyvelich commented 5 months ago

This should be complete. Thanks @StefanoFioravanzo for your help!