Closed dmpetrov closed 1 year ago
@dmpetrov I am looking to contribute to the project Improving and expanding User Guide
in this upcoming Google Season of Docs 2019. I came here just after the organisations were announced, I already have experience with the opensource organisations. You can see my GitHub profile and Gitlab profile: https://gitlab.com/Dhiraj240?nav_source=navbar
May I know on how to get started with this project so that I will be able to submit a proposal during the proposal period. I found your discord channel for communication.
@Dhiraj240 thank you for your interest! Yeah, Discord is a better place to discuss. I'm seeing you have already started.
First, I will work on some issues by this week then after that please guide me. On a weekly basis, I will solve issues. :smile:
Awesome!
The user guide structure has changed quite a bit since this was opened. Is it still desired?
Yes, I think it still relevant. We still don't have a good intermediate structure for the UG. Not sure if it overlaps with some other tickets or duplicates them.
OK. I'd just note that to avoid 4 levels (excessive clicking) we'll need to remove the Contributing submenu and just list both contribution guides inside Customization directly.
@jorgeorpinel don't consider that split in the ticket description as a final one. It's just an example to illustrate the idea.
Other general doc structure changes (each one of these could be a good-first-issue
:
config
cmd ref into the user guide #340
- [x] Some of these movements will require redirects, at least for some time.
Other general doc structure changes:
...
I have created a pull request for absorbing the understanding DVC section. Please take a look at it and let me know if there are any changes that I may need to bring about.
@VANRao-Stack thanks, I will check it out. You are also involved with https://github.com/iterative/dvc.org/issues/614#issuecomment-630561304 though, please pick one issue to focus on for now and let us know.
UPDATE: I know this issue is marked as good-first-issue
and thanks for taking the initiative but I think this is kind of an epic with many subtasks, some of which are good first issues:
Your PR is a good start but please focus on a single subtask to make this more manageable.
The user guide structure has changed quite a bit since this was opened. Is it still desired?
Yes, I think it still relevant. We still don't have a good intermediate structure for the UG.
So is the intention of this ticket to figure out the most efficient way to organize our current and future docs in a sustainable way? A kind of "ultimate solution" to docs structure? (If so I can update the issue's title and desc.) Cc @dmpetrov
I think for that we would need to analyze the website traffic, search results, conversions, etc. to make sure the stuff we put close to the surface is the most needed, and to determine which things can get buried or even hidden (no nav entry), even deleted (left for blog posts and support channels to cover).
Here's an interesting framework to consider, brought up by @shcheklein: https://documentation.divio.com/introduction/
documentation needs to include and be structured around its four different functions: tutorials, how-to guides, technical reference and explanation. Each of them requires a distinct mode of writing.
Here's a proposal:
topic
topic
topic
Cc @shcheklein @dberenbaum @casperdcl @iesahin
Re: doc-vs-docs, a slight update to my "slight preference for fewer chars" https://github.com/iterative/dvc.org/issues/2443#issuecomment-832808614:
Mathematics -> maths (UK)/math (US). Documentation -> doc (universal), surely? Please don't tell me it's doc (UK)/docs (US) :confused:
I wouldn't nest everything under a "guide" (or whatever name) level - seems like unnecessary user navigational difficulty.
data-mgmt/ ποΈ Data Management
topic
Is Data Management a good name for the topic? DVC versions and transfers data that goes to ML model. Data management seems like a broader term that might include data sources before it goes to DVC - Data Wherehouse/DB or a directory with "immutable" data / S3. The only scenario when DVC does proper data management is data registry. I don't see any better name so far π¬ Do you have any ideas?
data-pipelines/ π Data Pipelines
topic
Data Pipeline refers to data engineering tools such as AirFlow. I'd suggest using ML Pipelines that might be also not the best but seems slightly better.
We probably need an additional section on model management that should include all the metrics and plot navigation commands - dvc metrics/plots
. These commands are pretty independent (even should work without dvc.yaml
if a target is specified) and make Git repo metrics-driven. Also, extracting model management into a separate topic will reduce the complexity of ML-pipeline and experiment topics.
exp-mgmt π©βπ¬ Experiment Management
topic
Experiment Management - "Experiment Tracking" is another term but I'm not sure which one is the best.
IMHO it may be doc
or docs
, but if it could be identical with the file path (currently, docs
), I could navigate the [links] with my gf
.
For data pipelines, another alternative may be model pipelines, as it's usually the end product, or data-model pipelines. (Data in, model out.)
Experiment Tracking is better than Management. Another may be Experiment Versioning, or Experiment Version Control, or Experiment Version Tracking.
Documentation -> doc (universal), surely?
@casperdcl it's not universal but also I don't thin it's related to US vs UK. TBH I wasn't proposing to change it, it just came out that way. But a quick check gives me the impression that docs
is more common e.g. terraform.io/docs, docs.aws.amazon.com, developers.google.com/docs, docs.microsoft.com/en-us/azure
wouldn't nest everything under a "guide" (or whatever name) level
I incline the same way but you would need to scroll a lot to find the references without that level so I'm not sure.
@dmpetrov agree to rethink the topic titles. If we can agree on the grouping of content, we can decide that during the PR(s).
additional section on model management that should include all the metrics and plot navigation commands - dvc metrics/plots. These commands are pretty independent ... and make Git repo metrics-driven
Maybe (structure-wise):
@iesahin with your gf?
I assume he meant gfm
(github flavoured markdown) rather than the usual abbreviation for girlfriend
(I assume that doc/docs, cases
, etc - are not about changing names and URLs, but rather local names here to discuss the structure and refer to the sections faster)
cases
are too abstract, it better to start with usual things about the product - how to start, how to install, etc. Happy path pretty much.Overall, this suggestion has a lot of name changes, structural changes that are hard for me to justify. I would start by cleaning this up step by step and by introducing sections in the UG as we go (e.g. when we move shared cache). We'll have more clarity after that and generalize things.
gf
is go to file in vim. π It's easier to navigate the links by having the cursor on link and type gf
. This feature is also available in VS Code markdown plugins I think. We can't navigate the links offline because links are /doc/
but paths are docs/
.
I also think it's a not a big deal. I have other means to navigate.
get started here. I see it's hidden under UG, but I would not do this. It's better to have it as a top-level section.
@shcheklein a) if we remove the UG level (https://github.com/iterative/dvc.org/issues/144#issuecomment-841990301) then they're the first pages under each topic, b) we could repeat those pages in 2 places, c) we can list/link them all directly in /docs home.
I also don't mind keeping a top-level Gs group for now but its structure looks a lot like the proposed top-level (or UG) structure: Data, Pipelines, Models, Experiments, which may make the navigation confusing. Also I expect it will grow even longer and it's starting to look like a full tutorial... But that's a separate issue.
cases are too abstract, it better to start with usual things about the product
I moved them up thinking that we won't even keep them under /docs. But as long as they're in here sure, we can keep them after GS
Install - we had a quite long discussion and decided that it's good to keep it top level, right?
It's in the top if we remove the UG level (UG is an abstract docs container in the proposal).
Guide should be starting with some overview - basic concepts
True, I forgot about Basic Concepts but it doesn't exist yet (there's an issue for that).
this suggestion has a lot of name changes, structural changes that are hard for me to justify.
Really the main proposal would be to eliminate the UG level and regroup most guides into 4 topics instead. The other big change was the redistribution of Get Started entries but it's not needed now. In summary:
/docs Home Install Get Started (Use Cases) Data Management Modeling Pipelines ML Model Optimization Experiment Tracking Cmd Ref API Ref Misc?
wrt https://github.com/iterative/dvc.org/issues/144#issuecomment-841960760:
/doc/cases
may extract to
/cases
(outside of docs) later rel #820
I'd say merge with /features
and /
. Complex tutorial-like bits to be extracted to other doc pages.
/doc/guide
-> /doc
guide book User Guide (main container to separate from refs.) - could even remove this level?
/guide/
is a meaningless level - apparently only exists to collapse irrelevant info for those looking for /doc/api-ref/
and /doc/cli-ref/
. There are other ways to emphasise *-ref
(e.g. italics, bold, etc). Note that https://docs.docker.com has separate roots for "Guides," "Product manuals," "Reference," and "Samples."
/doc/dvc-files/
Metafile Formats and Internals (.dvc/) - maybe reorg into topics below
This should appear next to CLI-ref and API-ref because it's at that level of detail. Also should be called "DVC file formats/project directory structure" or something more descriptive.
/doc/data-mgmt/
/doc/data-pipelines/
/doc/exp-mgmt/
I agree with the other comments that these are misleading names.
Also presumably these will be slightly more specific that /cases
//features
//
but explicitly NOT tutorials.
/doc/how-to/
There are presumably tutorials.
We desperately need to have a list of things which we consider synonymns because otherwise the language barrier seems to be the biggest problem when communicating with each other.
/doc/contributing/
Doesn't really have much business being here, I think. Should be in repo CONTRIBUTING.md
/.github/CONTRIBUTING.md
or repo's wiki.
/doc/troubleshooting/
This should be at /help
or /support
surely?
@casperdcl thanks. I think we can worry about moving /cases out of docs and about outliers (contrib, troubleshooting) in a 2nd iteration.
Yes, my main proposal is to remove the /guide level. I don't know if it's completely meaningless (the word "guide" gives you a good idea of what you'll find inside) but it's better to reorg into the topics discussed above. And I think we can find the right names as we work on that change, once/if we all agree on this.
The Q on whether /doc/dvc-files is a reference or a guide is also not so simple so I'd let it be for now (same as for where to list /start pages).
So to recap once more: a first reorg iteration would split the Guide into 4 topics. All in favor say aye β
my main proposal is to remove the /guide level. I don't know if it's completely meaningless
I agree, this is what I meant. More accurately I mean "it's not useful to nest things under a level whatever you call it."
a first reorg iteration would split the Guide into 4 topics
what 4 topics? Can you update the main reference https://github.com/iterative/dvc.org/issues/144#issuecomment-841960760 to make this clear?
Yes, my main proposal is to remove the /guide level
Please, let's not do this. A few reasons from the top of my head:
/guide/ is a meaningless level - apparently only exists to collapse irrelevant info for those looking for
I don't agree. We have never spent enough time to do it right - that's why it looks this way. If we move stuff out, then all docs look like a mess. If we plan on how do we move and instead reorganize inside - it'll look better.
How tos
are technically tutorials, but they have a very precise angle - how to solve a very niche problem, vs general tutorials. You see the difference by their titles.
Let's first try to properly consolidate things under Guide and then we can decide?
OK. So would reorganizing most guides into 4 topics as (sub-levels of /doc/guide) be a good first step? Basic Concepts can stay in the beginning of the Guide section (before the 4 topics). How To and Outlier pages can be after (I think those are found most often from searches or links from other docs).
...
Guide
DVC Concepts ...
Data Management (1) ...
Modeling Pipelines (2) ...
ML Model Optimization (3) ...
Experiment Tracking (4) ...
How to ...
Troubleshooting
Contributing
* Names aren't final
Cc @casperdcl βοΈ
I don't have strong opinions on this. I'm mostly a search guy and never had difficulty to navigate in the docs. :)
Right. It's not just about the nav though. This organization will help us compartmentalize the areas of docs better. It should help with planning and knowing we are covering enough types of docs for enough types of users.
The idea behind the existing structure is to reflect 3-4 major parts of any docs - Refs, Guides, Quick Start, etc.
Hmm I'm thinking about a traditional book, report, article, journal etc:
AKA:
Where the unique thing about the "main body" is that (unlike all the other parts) it's never actually labelled. You don't have a chapter titled "main body" - at most you use different numbering to make a distinction, but not a nested level, e.g:
i. Use Cases
ii. Installation
1. Data Registry
2. ML Pipelines
3. ...
A. Command Ref
B. API Ref
So basically I'm not accustomed to the idea of a main body (in this case labelled guide) nesting.
Interesting analogy! To me docs != book (for a lot of reasons). But if you prefer that analogy - User Guide == DVC Book for me and should be structured the way you described. Other sections = marketing materials, quick brochures around, etc.
:shrug:
From chat with @dmitry. Some feedback from advanced DVC users on docs experience:
Docs now are too command-centric; people can sometimes get lost in the details; unclear where to find the information they need (or what they even need in the first place).
p.s. @jendefig also mentioned this. More detailed notes here (internal)
Idea on the structure of experience (not an exact map to sitemap necessarily):
DVC website doesn't have to be only about DVC -- it's about data science/ machine learning workflows
We can do a better job of referring to other products (MLEM, CML, etc.) at this level
Repeating the questions in https://www.notion.so/iterative/wip-Audiences-ccb7abb9a198476aa0c01581479c5eaf
The answers to these questions should shed some light on the docs organization. We can't be everything to everyone, so we must select the most obvious audience and make docs organization easier for them.
@iesahin @jorgeorpinel While I do agree that organizing docs should be a function of audience, I think the need and request for having workflows isn't an audience question, it's because as the tools have more features, the docs grow and become harder to absorb. (It's a lot to read/take-in). We have to figure out a way to simplify the organization of the docs to seem less overwhelming. All the documentation on commands must be there. It's a question of ingestion and that ingestion experience.
Landing on home page isn't showing a clear BIG picture of where in the ML process DVC fits (solution positioning) I agree with this one hundred percent. Need a big picture a-ha moment right from the start. And it needs to be a sharable image.
On 1. Are we sure that Use Case is the best term? We should use the language they are coming to us with. I'm curious what terminology was used in @dmpetrov's discussions with the same request. It is interesting that "command-centric" was used in what we both heard, but in what they were looking for instead, what terminology did they use? (What makes sense to our audience?) (Maykon definitely used "workflow")
Also regarding this in particular. Why don't we just crowd source the Community? Ask the question: "To better serve your needs in our docs, we are asking for you to fill in and upvote items in this document (public in Notion?) on what workflows you would like to see covered in our docs. For example, finish the search "How do I _____ ?" (or however we should ask this).
Also from talking to Community members, it seems they go through steps in their usage/adoption. Maybe we need to change the buttons on the main page to these milestones? I want to: Version my Data Set up my pipeline Version my Models Version my Experiments Examine my Results Automate testing
Regarding the "Big picture" image, it would be awesome to be able to click on which section of the process they are dealing with to get to the corresponding docs about our tools. (Yes, I dream big ;) )
On 2. Quick start vs. get started? This would be confusing. Why not Features and Tutorials? with a search capability in each
Not sure about 3 and 4. need more rumination.
The top priority should be to have a clear big picture expressed in the home page: e.g. a figure covering the entire ML lifecycle, where DVC comes in, and why that's better.
Other than that, the larger challenge is achieving a docs structure that can fit a lot of content in a way that seems simple and clear. I'm not sure that can be done under a single tree like the navigation menu we have on the left side.
π‘ Maybe we need an orthogonal separation by features (mapped to the big picture above) that essentially filter content depending on the focus of each user (similar to the proposed Get Started trails @iesahin has been working on). There would be some overlap in terms of explanation and reference docs, but a single Quick Start one-pager in each one and/or a bit longer Learn "tutorial?" (current G/S pages) + room for more complex tutorials if needed.
On the term "use cases" we can definitely change it to "scenarios", "solutions", "why DVC", or something else. They should probably get out of the Docs section and be proper landing pages instead of the current /features (this has been discussed several times). And in that case we don't even need to name them anything special, they'd just be in custom URL paths like dvc.org/data-versioning .
What I had in mind talking about "audiences" is that the workflows and expectations of these audiences vary considerably. We don't have a "single basic workflow" as Git may have. DVC's MLOps workflow(s) are considerably different than Data Science workflow(s), and IMO the best way to model this is via answering the "who?" question.
If we start from "who", then it's easier to think about "what do these users do in 80% of their time with DVC?" question that will lead us to specific workflows.
On top of this, we can have several different ways to structure the documents, e.g., longer tutorials vs quick get started. I'd like to have some tests and feedback mechanisms to measure the merits of these before deciding.
Maybe we need to change the buttons on the main page to these milestones? I want to: Version my Data Set up my pipeline Version my Models Version my Experiments Examine my Results Automate testing
I like this idea. A popup perhaps linking to relevant "getting started" sections?
UPDATE!
I think that this issue is too broad so we may never be satisfied with a definitive plan for this.
Also, the main remaining problem seems to be in the User Guide (book-like explanation content) and we have several "smaller" epics to tackle that. Specifically:
As well as a few important issues related to the Command Reference:
So we should probably close this, as long as those issues reflect the directions we intend, as described in https://github.com/iterative/dvc.org/issues/144#issuecomment-1000481879 above:
Docs now are too command-centric; people can sometimes get lost in the details; unclear where to find the information they need...
- Landing on home page isn't showing a clear BIG picture;
- High-level scenarios (use cases)
- How do we teach the tools? Quick Start
- Usage Guides (explanations and/or walkthroughs) & References (purpose details, semantics)
- Everything else? (How-tos, Troubleshooting, etc.)
The only other thing from βπΌ that's not covered in the mentioned issues is the home page redesign, but I can create a separate issue for that. WDYT @dmpetrov @shcheklein @casperdcl ? Thanks
Edit: there's also a related feature request/discussion:
I think that this issue is too broad so we may never be satisfied with a definitive plan for this.
to me we can definitely come up with some action points
@jorgeorpinel and I discussed some of these issues. A summary of some of my takeaways:
cc @shcheklein
Alternative that I had in Slack (just as an idea):
What is DVC (move related techs inside it)
Currently, I started disintegrating that one as we didn't see it fitting the goals of the user guide as stated above (high-level feature explanations).
However, π‘ we could turn it into an Overview instead, which would combine its existing Core Features section as well as Related Technologies. That way we take care of 2 birds (from the bullet list above) with one stone.
π¦ π¦ πͺ¨ β οΈ β οΈ
Alternative that I had in Slack (just as an idea):
What is DVC (move related techs inside it) Project Str Experiments Mng (Visualzing Plots should be inside) Data Management (Remote, Large Data, External Data) Troubleshooting How to (move Windows inside, may even Google Drive for now) Telemetry Policy Glossary (I would remove it - I don;βt think it;βs useful tbh - why do we need it)?
Maybe we need an additional Pipelines Mgmt
section in addition to Experiments Mgmt
and Data Mgmt
(esp since we just created this section π )?
Hm, now I see that pipelines is complicating things since pipelines are so tightly coupled to data management.
That could make sense if we position data management as including the pipelines to generate the data π€ .
Maybe we need an additional Pipelines Mgmt section
We have https://dvc.org/doc/user-guide/pipelines and https://github.com/iterative/dvc.org/issues/2883.
data management as including the pipelines to generate the data
I think that for the get started is OK to cover pipelining basics inside Data Management but the feature set is so rich (and maybe has some non-data-centric uses) that it does deserve a user guide section. We can still mention and link a lot about it from data mgmt pages and vice versa.
Glossary (I would remove it - I donβt think itβs useful tbh - why do we need it)?
No strong opinion but I like it. It's at the bottom of the section so it's not in the way either.
p.s. probably the page I visit more from Git docs is https://git-scm.com/docs/gitglossary (but then again I work a lot with wording and definitions).
UPDATE: Jump to https://github.com/iterative/dvc.org/issues/144#issuecomment-841960760; E: https://github.com/iterative/dvc.org/issues/144#issuecomment-1206648437
We need to add an additional level to the user guide:
UPDATE: See discussion about structure below, and more subtasks in https://github.com/iterative/dvc.org/issues/144#issuecomment-584441916 further below