Closed andreyvelich closed 2 months ago
Thank you for raising this great issue! Describing all features in the doc would be great. For example, we don't have any doc for TFJob with enableDynamicWorker.
So, as a first iteration, we should identify which feature we don't have any document.
cc @andreeamun
@andreyvelich @tenzen-y As discussed, I looked into the training operator docs and I want to propose an initial refactoring to better align with best practices in how technical docs are organized.
A little premise to my porposal: in general you want tech docs to be organized in macro sections that roughly address
In our case we may also want to consider a "Developer" section, particularly useful for OSS projects.
Now, I can see clear ways to improve the current doc structure to better align with that model. Here are some suggestions:
This doesn't have to happen all in one PR, that's why I split into sequential steps. Let me know what you think. We can start iterating on some of these points in draft PRs and I am happy to get this started.
Thank you so much for this @StefanoFioravanzo, I really like your ideas. A few questions:
all the CRD reference + implementation details go here.
We don't have CRD reference right now, how should we split these sections?
@kubeflow/wg-training-leads what are your thoughts ?
@andreyvelich
Should we order Installation before Getting Started page ?
Yes let's keep installation before getting started. It makes sense for folks who need to go through the installation before getting their hands on.
Do we want to separate guides between Users, Administrators, and Developers
I am in favour of having additional grouping based on the persona. But, as a first step, I recommend limiting the amount of change. So, as you suggest, let's move all how-tos/guides to a generic "user guides" section. Once we go through this initial restructuring exercise, we can further refine.
We don't have CRD reference right now, how should we split these sections?
I think we do. I think I saw some generic CRD reference for some of the frameworks. If we don't have enough details, we can still add a "TBD" under a framework's reference/API guide.
@StefanoFioravanzo I think, we have only this one: https://github.com/kubeflow/training-operator/blob/master/docs/api/kubeflow.org_v1_generated.asciidoc, but I am not sure if we keep this doc updated. Isn't it @kubeflow/wg-training-leads ?
@andreyvelich since we merged https://github.com/kubeflow/website/pull/3719, can we revisit the first comment of this issue? What do we want to address for training operator 1.8 (Kubeflow 1.9)?
I think, as part of Kubeflow 1.9 we completed all items. Let me close this issue.
On the recent AutoML and Training WG call we discuss how we can improve the documentation for Training Operator and onboarding for new contributors.
We identify several action items that we can work before the release:
Why using Kubeflow Training Operator ?
Where we can explain user stories and how Training Operator can manage distributed training for various ML framework in a single place. So ML Engineers can easily train their ML models using unify operator.TrainingClient
, ref issue in Katib repo: https://github.com/kubeflow/katib/issues/2081Please let me know if we should add something else @kubeflow/release-managers @kubeflow/wg-training-leads @tenzen-y @shashank-iitbhu.