kubeflow / website

Kubeflow's public website
Creative Commons Attribution 4.0 International
145 stars 752 forks source link

Training: Reorganized Training Operator Docs #3719

Closed andreyvelich closed 2 months ago

andreyvelich commented 2 months ago

Related: https://github.com/kubeflow/training-operator/issues/1998. I created the following sections for Training Operator docs:

A few points:

/hold for review

/assign @StefanoFioravanzo @kubeflow/wg-training-leads @hbelmiro @kuizhiqing @droctothorpe @franciscojavierarceo Looking for your feedback!

google-oss-prow[bot] commented 2 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubeflow/website/blob/master/OWNERS)~~ [andreyvelich] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
StefanoFioravanzo commented 2 months ago

@andreyvelich thanks for this!

Any ideas on what we could add to Why Training Operator? section ?

Let's start with something simple and iterate in future PRs. We can start by answering questions like:

I didn't move CRDs to reference in this PR since we don't have time to discuss how we are going to generate them. What do you think we should do in this PR ?

Makes sense. I'd keep the scope of this PR to the restructuring you already implemented. Let's iterate on content separtely. We can address each framework's user guide in dedicated PRs.

Do we need to have working example in GettingStarted page ? Would it be too complicated to consume ?

Getting Started should have an end-to-end working (yet simple) example. Generally people just want to copy paste some stuff, run it, and see results. Then you typically link some more advanced tutorials or user guides at the end

andreyvelich commented 2 months ago

That makes sense @StefanoFioravanzo, I added initial ideas for Why Training Operator ? and also I added the AI/ML lifecycle diagram that we can re-use in various Kubeflow components to explain which stage of lifecycle each component addresses (e.g. Spark Operator, Model Registry, Katib, Notebooks, KServe). Please let me know what do you think @kubeflow/wg-training-leads @StefanoFioravanzo ?

andreyvelich commented 2 months ago

@franciscojavierarceo I added working example, does it look good ?

andreyvelich commented 2 months ago

@franciscojavierarceo and I made a few changes for the AI/ML lifecycle diagram, so it would be easier to use it in other Kubeflow Components doc (e.g. Katib, FEAST, Model Registry, KServe, Notebooks, Spark Operator).

andreyvelich commented 2 months ago

This PR should be ready unless you have any other comments. /hold cancel