iterative / dvc.org

πŸ“– DVC website and documentation
https://dvc.org
Apache License 2.0
326 stars 386 forks source link

docs: "definitive" organization #144

Closed dmpetrov closed 1 year ago

dmpetrov commented 5 years ago

UPDATE: Jump to https://github.com/iterative/dvc.org/issues/144#issuecomment-841960760; E: https://github.com/iterative/dvc.org/issues/144#issuecomment-1206648437


We need to add an additional level to the user guide:

  1. Basic
    • DVC Files and Directories
    • DVC File Format
    • External Dependencies
    • External Outputs
    • Update a Tracked File
    • Anonymized Usage Analytics
  2. Customization
    • DVC Shell Autocomplete
    • IDE Plugins & Syntax Highlighting
    • Development Version
    • Contributing

UPDATE: See discussion about structure below, and more subtasks in https://github.com/iterative/dvc.org/issues/144#issuecomment-584441916 further below

Dhiraj240 commented 5 years ago

@dmpetrov I am looking to contribute to the project Improving and expanding User Guide in this upcoming Google Season of Docs 2019. I came here just after the organisations were announced, I already have experience with the opensource organisations. You can see my GitHub profile and Gitlab profile: https://gitlab.com/Dhiraj240?nav_source=navbar May I know on how to get started with this project so that I will be able to submit a proposal during the proposal period. I found your discord channel for communication.

dmpetrov commented 5 years ago

@Dhiraj240 thank you for your interest! Yeah, Discord is a better place to discuss. I'm seeing you have already started.

Dhiraj240 commented 5 years ago

First, I will work on some issues by this week then after that please guide me. On a weekly basis, I will solve issues. :smile:

dmpetrov commented 5 years ago

Awesome!

jorgeorpinel commented 4 years ago

The user guide structure has changed quite a bit since this was opened. Is it still desired?

shcheklein commented 4 years ago

Yes, I think it still relevant. We still don't have a good intermediate structure for the UG. Not sure if it overlaps with some other tickets or duplicates them.

jorgeorpinel commented 4 years ago

OK. I'd just note that to avoid 4 levels (excessive clicking) we'll need to remove the Contributing submenu and just list both contribution guides inside Customization directly.

shcheklein commented 4 years ago

@jorgeorpinel don't consider that split in the ticket description as a final one. It's just an example to illustrate the idea.

jorgeorpinel commented 4 years ago

Other general doc structure changes (each one of these could be a good-first-issue:

VANRao-Stack commented 4 years ago

Other general doc structure changes:

...

I have created a pull request for absorbing the understanding DVC section. Please take a look at it and let me know if there are any changes that I may need to bring about.

jorgeorpinel commented 4 years ago

@VANRao-Stack thanks, I will check it out. You are also involved with https://github.com/iterative/dvc.org/issues/614#issuecomment-630561304 though, please pick one issue to focus on for now and let us know.


UPDATE: I know this issue is marked as good-first-issue and thanks for taking the initiative but I think this is kind of an epic with many subtasks, some of which are good first issues:

Your PR is a good start but please focus on a single subtask to make this more manageable.

jorgeorpinel commented 4 years ago

The user guide structure has changed quite a bit since this was opened. Is it still desired?

Yes, I think it still relevant. We still don't have a good intermediate structure for the UG.

So is the intention of this ticket to figure out the most efficient way to organize our current and future docs in a sustainable way? A kind of "ultimate solution" to docs structure? (If so I can update the issue's title and desc.) Cc @dmpetrov

I think for that we would need to analyze the website traffic, search results, conversions, etc. to make sure the stuff we put close to the surface is the most needed, and to determine which things can get buried or even hidden (no nav entry), even deleted (left for blog posts and support channels to cover).

jorgeorpinel commented 3 years ago

Here's an interesting framework to consider, brought up by @shcheklein: https://documentation.divio.com/introduction/

documentation needs to include and be structured around its four different functions: tutorials, how-to guides, technical reference and explanation. Each of them requires a distinct mode of writing.

jorgeorpinel commented 3 years ago

Here's a proposal:

Cc @shcheklein @dberenbaum @casperdcl @iesahin

casperdcl commented 3 years ago

Re: doc-vs-docs, a slight update to my "slight preference for fewer chars" https://github.com/iterative/dvc.org/issues/2443#issuecomment-832808614:

Mathematics -> maths (UK)/math (US). Documentation -> doc (universal), surely? Please don't tell me it's doc (UK)/docs (US) :confused:

casperdcl commented 3 years ago

I wouldn't nest everything under a "guide" (or whatever name) level - seems like unnecessary user navigational difficulty.

dmpetrov commented 3 years ago

data-mgmt/ πŸ—ƒοΈ Data Management topic

Is Data Management a good name for the topic? DVC versions and transfers data that goes to ML model. Data management seems like a broader term that might include data sources before it goes to DVC - Data Wherehouse/DB or a directory with "immutable" data / S3. The only scenario when DVC does proper data management is data registry. I don't see any better name so far 😬 Do you have any ideas?

data-pipelines/ πŸ”ƒ Data Pipelines topic

Data Pipeline refers to data engineering tools such as AirFlow. I'd suggest using ML Pipelines that might be also not the best but seems slightly better.

We probably need an additional section on model management that should include all the metrics and plot navigation commands - dvc metrics/plots. These commands are pretty independent (even should work without dvc.yaml if a target is specified) and make Git repo metrics-driven. Also, extracting model management into a separate topic will reduce the complexity of ML-pipeline and experiment topics.

exp-mgmt πŸ‘©β€πŸ”¬ Experiment Management topic

Experiment Management - "Experiment Tracking" is another term but I'm not sure which one is the best.

iesahin commented 3 years ago
jorgeorpinel commented 3 years ago

Documentation -> doc (universal), surely?

@casperdcl it's not universal but also I don't thin it's related to US vs UK. TBH I wasn't proposing to change it, it just came out that way. But a quick check gives me the impression that docs is more common e.g. terraform.io/docs, docs.aws.amazon.com, developers.google.com/docs, docs.microsoft.com/en-us/azure

wouldn't nest everything under a "guide" (or whatever name) level

I incline the same way but you would need to scroll a lot to find the references without that level so I'm not sure.

@dmpetrov agree to rethink the topic titles. If we can agree on the grouping of content, we can decide that during the PR(s).

additional section on model management that should include all the metrics and plot navigation commands - dvc metrics/plots. These commands are pretty independent ... and make Git repo metrics-driven

Maybe (structure-wise):

@iesahin with your gf?

casperdcl commented 3 years ago

I assume he meant gfm (github flavoured markdown) rather than the usual abbreviation for girlfriend

shcheklein commented 3 years ago

(I assume that doc/docs, cases, etc - are not about changing names and URLs, but rather local names here to discuss the structure and refer to the sections faster)


Overall, this suggestion has a lot of name changes, structural changes that are hard for me to justify. I would start by cleaning this up step by step and by introducing sections in the UG as we go (e.g. when we move shared cache). We'll have more clarity after that and generalize things.

iesahin commented 3 years ago

gf is go to file in vim. 😁 It's easier to navigate the links by having the cursor on link and type gf. This feature is also available in VS Code markdown plugins I think. We can't navigate the links offline because links are /doc/ but paths are docs/.

I also think it's a not a big deal. I have other means to navigate.

jorgeorpinel commented 3 years ago

get started here. I see it's hidden under UG, but I would not do this. It's better to have it as a top-level section.

@shcheklein a) if we remove the UG level (https://github.com/iterative/dvc.org/issues/144#issuecomment-841990301) then they're the first pages under each topic, b) we could repeat those pages in 2 places, c) we can list/link them all directly in /docs home.

I also don't mind keeping a top-level Gs group for now but its structure looks a lot like the proposed top-level (or UG) structure: Data, Pipelines, Models, Experiments, which may make the navigation confusing. Also I expect it will grow even longer and it's starting to look like a full tutorial... But that's a separate issue.

cases are too abstract, it better to start with usual things about the product

I moved them up thinking that we won't even keep them under /docs. But as long as they're in here sure, we can keep them after GS

Install - we had a quite long discussion and decided that it's good to keep it top level, right?

It's in the top if we remove the UG level (UG is an abstract docs container in the proposal).

Guide should be starting with some overview - basic concepts

True, I forgot about Basic Concepts but it doesn't exist yet (there's an issue for that).

this suggestion has a lot of name changes, structural changes that are hard for me to justify.

Really the main proposal would be to eliminate the UG level and regroup most guides into 4 topics instead. The other big change was the redistribution of Get Started entries but it's not needed now. In summary:

/docs Home Install Get Started (Use Cases) Data Management Modeling Pipelines ML Model Optimization Experiment Tracking Cmd Ref API Ref Misc?

casperdcl commented 3 years ago

wrt https://github.com/iterative/dvc.org/issues/144#issuecomment-841960760:

may extract to /cases (outside of docs) later rel #820

I'd say merge with /features and /. Complex tutorial-like bits to be extracted to other doc pages.

guide book User Guide (main container to separate from refs.) - could even remove this level?

/guide/ is a meaningless level - apparently only exists to collapse irrelevant info for those looking for /doc/api-ref/ and /doc/cli-ref/. There are other ways to emphasise *-ref (e.g. italics, bold, etc). Note that https://docs.docker.com has separate roots for "Guides," "Product manuals," "Reference," and "Samples."

Metafile Formats and Internals (.dvc/) - maybe reorg into topics below

This should appear next to CLI-ref and API-ref because it's at that level of detail. Also should be called "DVC file formats/project directory structure" or something more descriptive.

I agree with the other comments that these are misleading names.

Also presumably these will be slightly more specific that /cases//features// but explicitly NOT tutorials.

There are presumably tutorials.

We desperately need to have a list of things which we consider synonymns because otherwise the language barrier seems to be the biggest problem when communicating with each other.

Doesn't really have much business being here, I think. Should be in repo CONTRIBUTING.md/.github/CONTRIBUTING.md or repo's wiki.

This should be at /help or /support surely?

jorgeorpinel commented 3 years ago

@casperdcl thanks. I think we can worry about moving /cases out of docs and about outliers (contrib, troubleshooting) in a 2nd iteration.

Yes, my main proposal is to remove the /guide level. I don't know if it's completely meaningless (the word "guide" gives you a good idea of what you'll find inside) but it's better to reorg into the topics discussed above. And I think we can find the right names as we work on that change, once/if we all agree on this.

The Q on whether /doc/dvc-files is a reference or a guide is also not so simple so I'd let it be for now (same as for where to list /start pages).

So to recap once more: a first reorg iteration would split the Guide into 4 topics. All in favor say aye βœ‹

casperdcl commented 3 years ago

my main proposal is to remove the /guide level. I don't know if it's completely meaningless

I agree, this is what I meant. More accurately I mean "it's not useful to nest things under a level whatever you call it."

a first reorg iteration would split the Guide into 4 topics

what 4 topics? Can you update the main reference https://github.com/iterative/dvc.org/issues/144#issuecomment-841960760 to make this clear?

shcheklein commented 3 years ago

Yes, my main proposal is to remove the /guide level

Please, let's not do this. A few reasons from the top of my head:

shcheklein commented 3 years ago

/guide/ is a meaningless level - apparently only exists to collapse irrelevant info for those looking for

I don't agree. We have never spent enough time to do it right - that's why it looks this way. If we move stuff out, then all docs look like a mess. If we plan on how do we move and instead reorganize inside - it'll look better.

shcheklein commented 3 years ago

How tos are technically tutorials, but they have a very precise angle - how to solve a very niche problem, vs general tutorials. You see the difference by their titles.

jorgeorpinel commented 3 years ago

Let's first try to properly consolidate things under Guide and then we can decide?

OK. So would reorganizing most guides into 4 topics as (sub-levels of /doc/guide) be a good first step? Basic Concepts can stay in the beginning of the Guide section (before the 4 topics). How To and Outlier pages can be after (I think those are found most often from searches or links from other docs).

...
Guide
  DVC Concepts ...
  Data Management (1) ...
  Modeling Pipelines (2) ...
  ML Model Optimization (3) ...
  Experiment Tracking (4) ...
  How to ...
  Troubleshooting
  Contributing

* Names aren't final

Cc @casperdcl ☝️

iesahin commented 3 years ago

I don't have strong opinions on this. I'm mostly a search guy and never had difficulty to navigate in the docs. :)

jorgeorpinel commented 3 years ago

Right. It's not just about the nav though. This organization will help us compartmentalize the areas of docs better. It should help with planning and knowing we are covering enough types of docs for enough types of users.

casperdcl commented 3 years ago

The idea behind the existing structure is to reflect 3-4 major parts of any docs - Refs, Guides, Quick Start, etc.

Hmm I'm thinking about a traditional book, report, article, journal etc:

  1. cruft
  2. main body
  3. cruft

AKA:

  1. intro, preface, acknowledgements, installation, etc.
  2. main body
  3. conclusion, bibliography, glossary, index, command reference, etc.

Where the unique thing about the "main body" is that (unlike all the other parts) it's never actually labelled. You don't have a chapter titled "main body" - at most you use different numbering to make a distinction, but not a nested level, e.g:

i. Use Cases
ii. Installation
1. Data Registry
2. ML Pipelines
3. ...
A. Command Ref
B. API Ref

So basically I'm not accustomed to the idea of a main body (in this case labelled guide) nesting.

shcheklein commented 3 years ago

Interesting analogy! To me docs != book (for a lot of reasons). But if you prefer that analogy - User Guide == DVC Book for me and should be structured the way you described. Other sections = marketing materials, quick brochures around, etc.

casperdcl commented 3 years ago

:shrug:

jorgeorpinel commented 2 years ago

From chat with @dmitry. Some feedback from advanced DVC users on docs experience:

Docs now are too command-centric; people can sometimes get lost in the details; unclear where to find the information they need (or what they even need in the first place).

p.s. @jendefig also mentioned this. More detailed notes here (internal)

Idea on the structure of experience (not an exact map to sitemap necessarily):

  1. Landing on home page isn't showing a clear BIG picture of where in the ML process DVC fits (solution positioning)

    DVC website doesn't have to be only about DVC -- it's about data science/ machine learning workflows

  2. High-level scenarios (use cases?) e.g. workflow for ML exps; exp mgmt; versioning; GitOps; etc.

    We can do a better job of referring to other products (MLEM, CML, etc.) at this level

  3. How do we teach the tools? Quick Start: single introductory section to grasp all the scenarios (features); vs. Get Started (tutorials): still introductory but go into more detail (take more time)
  4. Usage Guides (explanations and/or walkthroughs) & References (purpose details, semantics)
  5. Everything else? (How-tos, Troubleshooting, etc.)
iesahin commented 2 years ago

Repeating the questions in https://www.notion.so/iterative/wip-Audiences-ccb7abb9a198476aa0c01581479c5eaf

The answers to these questions should shed some light on the docs organization. We can't be everything to everyone, so we must select the most obvious audience and make docs organization easier for them.

jendefig commented 2 years ago

@iesahin @jorgeorpinel While I do agree that organizing docs should be a function of audience, I think the need and request for having workflows isn't an audience question, it's because as the tools have more features, the docs grow and become harder to absorb. (It's a lot to read/take-in). We have to figure out a way to simplify the organization of the docs to seem less overwhelming. All the documentation on commands must be there. It's a question of ingestion and that ingestion experience.

Landing on home page isn't showing a clear BIG picture of where in the ML process DVC fits (solution positioning) I agree with this one hundred percent. Need a big picture a-ha moment right from the start. And it needs to be a sharable image.

On 1. Are we sure that Use Case is the best term? We should use the language they are coming to us with. I'm curious what terminology was used in @dmpetrov's discussions with the same request. It is interesting that "command-centric" was used in what we both heard, but in what they were looking for instead, what terminology did they use? (What makes sense to our audience?) (Maykon definitely used "workflow")

Also regarding this in particular. Why don't we just crowd source the Community? Ask the question: "To better serve your needs in our docs, we are asking for you to fill in and upvote items in this document (public in Notion?) on what workflows you would like to see covered in our docs. For example, finish the search "How do I _____ ?" (or however we should ask this).

Also from talking to Community members, it seems they go through steps in their usage/adoption. Maybe we need to change the buttons on the main page to these milestones? I want to: Version my Data Set up my pipeline Version my Models Version my Experiments Examine my Results Automate testing

Regarding the "Big picture" image, it would be awesome to be able to click on which section of the process they are dealing with to get to the corresponding docs about our tools. (Yes, I dream big ;) )

On 2. Quick start vs. get started? This would be confusing. Why not Features and Tutorials? with a search capability in each

Not sure about 3 and 4. need more rumination.

jorgeorpinel commented 2 years ago

The top priority should be to have a clear big picture expressed in the home page: e.g. a figure covering the entire ML lifecycle, where DVC comes in, and why that's better.

Other than that, the larger challenge is achieving a docs structure that can fit a lot of content in a way that seems simple and clear. I'm not sure that can be done under a single tree like the navigation menu we have on the left side.

πŸ’‘ Maybe we need an orthogonal separation by features (mapped to the big picture above) that essentially filter content depending on the focus of each user (similar to the proposed Get Started trails @iesahin has been working on). There would be some overlap in terms of explanation and reference docs, but a single Quick Start one-pager in each one and/or a bit longer Learn "tutorial?" (current G/S pages) + room for more complex tutorials if needed.

On the term "use cases" we can definitely change it to "scenarios", "solutions", "why DVC", or something else. They should probably get out of the Docs section and be proper landing pages instead of the current /features (this has been discussed several times). And in that case we don't even need to name them anything special, they'd just be in custom URL paths like dvc.org/data-versioning .

iesahin commented 2 years ago

What I had in mind talking about "audiences" is that the workflows and expectations of these audiences vary considerably. We don't have a "single basic workflow" as Git may have. DVC's MLOps workflow(s) are considerably different than Data Science workflow(s), and IMO the best way to model this is via answering the "who?" question.

If we start from "who", then it's easier to think about "what do these users do in 80% of their time with DVC?" question that will lead us to specific workflows.

On top of this, we can have several different ways to structure the documents, e.g., longer tutorials vs quick get started. I'd like to have some tests and feedback mechanisms to measure the merits of these before deciding.

casperdcl commented 2 years ago

Maybe we need to change the buttons on the main page to these milestones? I want to: Version my Data Set up my pipeline Version my Models Version my Experiments Examine my Results Automate testing

I like this idea. A popup perhaps linking to relevant "getting started" sections?

jorgeorpinel commented 2 years ago

UPDATE!

I think that this issue is too broad so we may never be satisfied with a definitive plan for this.

Also, the main remaining problem seems to be in the User Guide (book-like explanation content) and we have several "smaller" epics to tackle that. Specifically:

As well as a few important issues related to the Command Reference:

So we should probably close this, as long as those issues reflect the directions we intend, as described in https://github.com/iterative/dvc.org/issues/144#issuecomment-1000481879 above:

Docs now are too command-centric; people can sometimes get lost in the details; unclear where to find the information they need...

  1. Landing on home page isn't showing a clear BIG picture;
  2. High-level scenarios (use cases)
  3. How do we teach the tools? Quick Start
  4. Usage Guides (explanations and/or walkthroughs) & References (purpose details, semantics)
  5. Everything else? (How-tos, Troubleshooting, etc.)

The only other thing from ☝🏼 that's not covered in the mentioned issues is the home page redesign, but I can create a separate issue for that. WDYT @dmpetrov @shcheklein @casperdcl ? Thanks


Edit: there's also a related feature request/discussion:

casperdcl commented 2 years ago

I think that this issue is too broad so we may never be satisfied with a definitive plan for this.

to me we can definitely come up with some action points

dberenbaum commented 1 year ago

@jorgeorpinel and I discussed some of these issues. A summary of some of my takeaways:

cc @shcheklein

shcheklein commented 1 year ago

Alternative that I had in Slack (just as an idea):

jorgeorpinel commented 1 year ago

What is DVC (move related techs inside it)

Currently, I started disintegrating that one as we didn't see it fitting the goals of the user guide as stated above (high-level feature explanations).

However, πŸ’‘ we could turn it into an Overview instead, which would combine its existing Core Features section as well as Related Technologies. That way we take care of 2 birds (from the bullet list above) with one stone.

🐦 🐦 πŸͺ¨ ☠️ ☠️

dberenbaum commented 1 year ago

4006 looks like a nice move in the right direction!

Alternative that I had in Slack (just as an idea):

What is DVC (move related techs inside it)
Project Str
Experiments Mng (Visualzing Plots should be inside)
Data Management (Remote, Large Data, External Data)
Troubleshooting
How to (move Windows inside, may even Google Drive for now)
Telemetry Policy
Glossary (I would remove it - I don;’t think it;’s useful tbh - why do we need it)?

Maybe we need an additional Pipelines Mgmt section in addition to Experiments Mgmt and Data Mgmt (esp since we just created this section πŸ˜„ )?

dberenbaum commented 1 year ago

Hm, now I see that pipelines is complicating things since pipelines are so tightly coupled to data management.

That could make sense if we position data management as including the pipelines to generate the data πŸ€” .

jorgeorpinel commented 1 year ago

Maybe we need an additional Pipelines Mgmt section

We have https://dvc.org/doc/user-guide/pipelines and https://github.com/iterative/dvc.org/issues/2883.

data management as including the pipelines to generate the data

I think that for the get started is OK to cover pipelining basics inside Data Management but the feature set is so rich (and maybe has some non-data-centric uses) that it does deserve a user guide section. We can still mention and link a lot about it from data mgmt pages and vice versa.

jorgeorpinel commented 1 year ago

Glossary (I would remove it - I don’t think it’s useful tbh - why do we need it)?

No strong opinion but I like it. It's at the bottom of the section so it's not in the way either.

p.s. probably the page I visit more from Git docs is https://git-scm.com/docs/gitglossary (but then again I work a lot with wording and definitions).