kubeflow / community

Information about the Kubeflow community including proposals and governance information.
Apache License 2.0
159 stars 220 forks source link

New workflow proposal for Documentation changes #784

Open rimolive opened 1 week ago

rimolive commented 1 week ago

One of the biggest challenges when working with Kubeflow 1.9 release is working on documentation changes for all Kubeflow components and add-ons. The Release Handbook describes the Docs Lead role as the person to coordinate documentation changes with all WGs, leaving the task to the Release Manager to take that responsibility in case the Release Team does not have any volunteers for the Docs Lead.

Motivation

The Kubeflow community wants to fill an open gap from a past Kubeflow User Survey to make documentation better, clear, and concise so users can get the information they need about every component, the community workflows, release communication, etc.

Problem Statement / Current Scenario

Throughout the releases, documentation changes hasn't been enough to cover everything in the new releases. Also, usually only one person volunteers to the Docs Lead role, adding a single flow to make changes in the entire documentation.

To make that worse, in case of no volunteers for the Docs Lead role, this task must be done by the Release Manager. This is a bad idea given the amount of responsibilities the Release Manager currently have.

Proposal

We could use the experience of the manifests sync phase in the Kubeflow releases to do the same with the documentation. That means every Working Group will keep a copy of the component documentation under the component GitHub repository, and after Feature Freeze Docs lead can use bash scripts to copy the documentation content to the kubeflow/website repository.

Some changes in the current roles are:

Why we need to change?

No one is better skilled to create documentation about a Kubeflow component than the WG members. That way, as part of a new contribution that needs documentation, the contributor can add in the component repo the code and documentation. Another advantage of this new workflow is to make documentation changes faster for releases, and it can be automated at some level, leaving the responsibility of the Docs Lead (and if we keep the rule of handover to the Release Manager in case we don't have any Docs Lead volunteers) easier to manage.

Open Questions

References

Slide deck

cc @kubeflow/release-team @kubeflow/kubeflow-steering-committee @kubeflow/wg-automl-leads @kubeflow/wg-data-leads @kubeflow/wg-notebooks-leads @kubeflow/wg-pipeline-leads @kubeflow/wg-training-leads @kubeflow/wg-manifests-leads

andreyvelich commented 1 week ago

Thank you for doing this @rimolive! As we discussed on the community meeting, can we convert this proposal to the official KEP (Kubeflow Enhancement Proposal) under community repo ? We can use the same KEP template as for Kubernetes: https://github.com/kubernetes/enhancements/blob/master/keps/NNNN-kep-template/README.md.

Similar to how we did it for Kubeflow Training V2: https://github.com/kubeflow/training-operator/tree/master/docs/proposals/2170-kubeflow-training-v2

diegolovison commented 1 week ago

I was a documentation lead for Kubeflow 1.9 and I am the documentation lead for Kubeflow 1.10. I have the following observations:

HumairAK commented 1 week ago

I cannot stress enough how important this is for improving component documentation.

It is a HUGE overhead adding documentation to a separate repo for ever PR that goes to the component repo, and enforcing best practices here is just painful. @rimolive 's proposal allows us to keep docs next to code, and this will easily allow us to review PRs, and within the PRs enforce the addition of new docs as part of said PR.

Docs Lead: Will work on maintain a set of scripts to sync documentation changes from Kubeflow components

My only suggestion here is, that the docs are pulled from tagged version commits for each component. The version matching the version going into the upcoming KF release. This will also help resolve another major pain point, which is the kubeflow/website docs being too ahead because the component versions have not yet released the code that implements these features. This would resolve that issue.

There is no need for a dedicated documentation lead. Each team can handle reviews and improvements to their respective documentation independently.

I would say the open questions still warrant some sort of a documentation lead. At least for the transition period.

thesuperzapper commented 1 week ago

Personally, I think storing the docs in separate repos will be problematic.

However, I think we all agree that the core issue is allowing per-component versioning (or at least some way for end users to know what version of a component added/removed a feature).

An alternative to fully versioning each component docs, is to use a JavaScript based approach. Everything could still live in the main repo, but we give docs writers a way to say "only show this section/page for version X.Y.Z of the component".

For example, we could have a version drop-down on each component section, which lists each version of that component, and hides sections which were added after that version when selected.

We can also put an indication within the docs itself about which version the feature was added in.

diegolovison commented 1 week ago

@thesuperzapper I liked the idea!

StefanoFioravanzo commented 1 week ago

@rimolive Thanks for starting this issue! I fully agree with your proposal. I think offloading documentation ownership to each WG is essential, doing this by moving the actual doc "source code" to each WG's repository is a practical way of enforcing that ownership.

You raise valid questions and concerns that I think we can discuss and resolve in a dedicated KEP. There will certainly be multiple solutions to each one, so let's work together to find a good comprise.

It's obvious that the current way of doing things hasn't worked very well and does not allow us to scale. This is a good discussion to push the community towards a leaner, more decentralized, more scalable documentation practice.

jgarciao commented 1 week ago

Another approach could be to organize the website and documentation like the Argo project:

I think having kubeflow.org as parent website, presenting all components and being a central point for community engagement (Blog, Events, ...) and have the details about each component in their dedicated site could be also a good approach.