iterative / dvc.org

📖 DVC website and documentation
https://dvc.org
Apache License 2.0
331 stars 392 forks source link

user-guide/get-started: reorganize "Understanding DVC" section #425

Closed jorgeorpinel closed 4 years ago

jorgeorpinel commented 5 years ago

Files currently located in /static/docs/understanding-dvc/

Remove all the Understanding DVC docs and just rescue all that is unique and valuable into the get started and user guide docs.

UPDATES:

ryokugyu commented 5 years ago

Will you please give an example of this for better understanding? @jorgeorpinel

shcheklein commented 5 years ago

@ryokugyu for example:

How it works - should be reviewed, simplified. made actionable. move to the user guide What is DVC - reviewed, moved to the user-guide

etc.

dnabanita7 commented 5 years ago

Can I take this up?

jorgeorpinel commented 5 years ago

Hi @Naba7 thanks for the interest. Will you be dropping issue #463 after all?

shcheklein commented 5 years ago

@Naba7 I agree with @jorgeorpinel let's do issues one by one, or at least tow in parallel. Quality is important, not speed here.

dnabanita7 commented 5 years ago

Yes, I agree @jorgeorpinel @shcheklein . It will be better to focus on one issue rather than fumbling over many. Sure, I will work on #463 .

dashohoxha commented 4 years ago

I am appending here my ideas about how to reorganize "Understanding DVC", but it is clear that those pages need to be heavily rewritten and updated (not just moved around or relocated on the menu).

srishti-nema commented 4 years ago

I am appending here my ideas about how to reorganize "Understanding DVC", but it is clear that those pages need to be heavily rewritten and updated (not just moved around or relocated on the menu)...

This structure looks good :+1: I was wondering whether we can move The Privacy Policy structure as discussed in https://github.com/iterative/dvc.org/pull/987#issuecomment-584407710 and #144(comment) under one of these sections?

jorgeorpinel commented 4 years ago

Actually that proposal while great when it was done may be outdated right now. The goal now is to reduce the amount of pages and absorb the unique information from Understanding DVC pages into existing User Guide pages as much as possible.

I was wondering whether we can move The Privacy Policy structure as discussed in #987 (comment) and #144(comment) under one of these sections?

Good finds (possibly removing GAPP from footer and from nav sidebar, which is now possible BTW). However, this is all unrelated to this issue.

Also, @srishti-nema as you're already assigned to #540 please focus on that for now.

VANRao-Stack commented 4 years ago

I am appending here my ideas about how to reorganize "Understanding DVC", but it is clear that those pages need to be heavily rewritten and updated (not just moved around or relocated on the menu)...

I have created a similar indexed version of absorbing DVC. I have incorporated certain facts that make it seem more in place, developing from how ML evolved to asking the right questions now. @jorgeorpinel, @shcheklein, if it's okay, should I create a PR?

shcheklein commented 4 years ago

@VANRao-Stack sounds cool! could you outline the changes first?

VANRao-Stack commented 4 years ago

@VANRao-Stack sounds cool! could you outline the changes first?

Yeah, definitely! I began with creating a two subsections within understanding the problem, where I began with issues in current ML practices, and then asking the right questions in order to resolve the same. I provided a couple of historical pointers to show how it was a similar thinking that lead to the growth of ml/dl in the first place.

NOTE: I did not change the questions posed, as I thought they were sufficient here.

I then created another section, containing two other sub-sections, with one explaining about the technologies currently in the market (comparison to existing technologies), then followed by a section describing what Experiment software Management meant.

NOTE: I'm still not sure about existing technologies and am having a little difficulty editing it. I don't think this section should be placed here, but instead towards the end right before core features or before Resources, I could instead here make a section on why technology today isn't good enough for development.

The next is a section on what DVC is and a couple of its key features, and as mentioned in #53, I noticed that there wasn't a keyword on DVC files and remote, so I added the same.

The next is core features, and as I mentioned in the previous note, I think we can move the exiting technologies here as clearly, a comparison with the existing technologies would be suitable here along with a couple of extra notes.

And finally how-it-works and resources, both of which I kept unedited, since a change I don't think was in order.

jorgeorpinel commented 4 years ago

@VANRao-Stack while some of the things you mention seem to make sense, I don't think your message outlines the suggested changes.

Can you try to describe the changes you are proposing in a very clear way? A list, a table, etc. something visual and easy to grasp as opposed to a recollection of your thought process combined with specific actions. Please help us understand your plan here 🙂

Thanks!

VANRao-Stack commented 4 years ago

@VANRao-Stack while some of the things you mention seem to make sense, I don't think your message outlines the suggested changes. Can you try to describe the changes you are proposing in a very clear way? A list, a table, etc. something visual and easy to grasp as opposed to a recollection of your thought process combined with specific actions. Please help us understand your plan here 🙂 Thanks!

Yeah sure!

I hope this clearly outlines my idea, if this isn't what was planned please let me know about it, any suggestions or opinions are also welcome.

jorgeorpinel commented 4 years ago

Thanks @VANRao-Stack. I still don't see a specific plan of what parts of Understanding DVC will go where in other existing doc sections. We'd like to see such a "map" as an initial plan. You also seem to have the intention to improve some of the content in question, which isn't bad, but that's not within the scope of this task, just deciding which parts to preserve, moving them, and changing their wording as needed to fit the new context...

How about this, just focus on one page (for example https://dvc.org/doc/understanding-dvc/collaboration-issues) and try to detail where you would move all that content into other sections of the docs, of which parts to just remove and why. You can submit a PR directly if that's easiest for you, or just post another comment with the plan here.

VANRao-Stack commented 4 years ago

Thanks @VANRao-Stack. I still don't see a specific plan of what parts of Understanding DVC will go where in other existing doc sections. We'd like to see such a "map" as an initial plan. You also seem to have the intention to improve some of the content in question, which isn't bad, but that's not within the scope of this task, just deciding which parts to preserve, moving them, and changing their wording as needed to fit the new context... How about this, just focus on one page (for example https://dvc.org/doc/understanding-dvc/collaboration-issues) and try to detail where you would move all that content into other sections of the docs, of which parts to just remove and why. You can submit a PR directly if that's easiest for you, or just post another comment with the plan here.

Oh, I had actually used the plan of where everything would be moved according to a previous comment by @dashohoxha that has been marked as resolved right now. I will create a PR so you can better see what I intended.

jorgeorpinel commented 4 years ago

I see. Yeah, sorry that plan is really old. I mentioned we needed a new strategy in https://github.com/iterative/dvc.org/issues/425#issuecomment-631716646 above. Again, please focus on a single page of Understanding DVC for now and lmk when your plan for that content is ready to check. Thanks

VANRao-Stack commented 4 years ago

Alright, since I am only supposed to work with one page, I propose we move related technologies to the end of the use cases section. After a couple of articles on workflows and other such scenarios, comparing it to related technologies would directly show how DVC brings something new to the table.

@jorgeorpinel , @shcheklein What do you think?

shcheklein commented 4 years ago

@VANRao-Stack I think it might be confusing to be honest. My take on this that it should become a part of the User Guide in some way. But the first question to answer - what value do expect/why do we have this section in the first place?

VANRao-Stack commented 4 years ago

@VANRao-Stack I think it might be confusing to be honest. My take on this that it should become a part of the User Guide in some way. But the first question to answer - what value do expect/why do we have this section in the first place?

Well the answer to the question you posed about why we have this section in the first place is, I think, with a first glance DVC kind of seems just like another tool like Git, but however it isn't, and this section shows how DVC differs from the existing technologies . The whole point of having a use case section was to make people familiar with the different cases in which DVC could be deployed, hence having a comparison between the existing technologies and DVC would help make this point firm in the readers' minds.

jorgeorpinel commented 4 years ago

move related technologies to the end of the use cases section. After a couple of articles on workflows and other such scenarios...

@VANRao-Stack Use Cases already talk about the benefits of DVC so I'm not sure about your reason either. Besides, this issue specifies the goal is to absorb Understanding DVC into Get Started or (most likely) User Guide.

what value do expect/why do we have this section in the first place?

This is a tricky question actually. The reason for Understanding DVC as a whole was probably originally something that is no longer a priority because it's covered by other sections like Get Started (or the glossary/ proposed Basic Concepts) better.

The Related Technologies page seems to me like it had the intention to attract users from those tools who find them limited when dealing with data, or/and to answer FAQ's like "isn't this just like Git-LFS?". Another message sent by that page, as Abhijith mentioned, is a summary of the features that DVC covers, which currently would require a bunch of other tools to put together. So perhaps this one page could be moved into a new User Guide indeed, titled something like "Migrating from Related Tools" or absorbed into the proposed How-to section (see #899 epic).


I think this discussions reveals that at least parts of this issue are not very introductory so I removed the good-first-issue label. Maybe you should try another one @VANRao-Stack... sorry for the back and forth. If you can find something much more defined, with a specific description that is crystal clear as to what should be done, that would be best for now I think.

jorgeorpinel commented 4 years ago

OR if you want a small subtask that helps advance this issue, how about this:

For example the first bullet in https://dvc.org/doc/understanding-dvc/how-it-works is pretty much covered in a few places: https://dvc.org/doc/tutorials/get-started/initialize and https://dvc.org/doc/command-reference/init

That should be a well-scoped and relatively easy PR to review.

VANRao-Stack commented 4 years ago

So I was working on the Core Features page, as you had suggested. Clearly most of the points mentioned is redundant:

These points have been mentioned several times in the docs before that, the only new point that I'm seeing is probably, DVC is open source and self serve. What should I do here? Because if I delete everything that's redundant here, it would pretty much make the entire page empty. @jorgeorpinel

jorgeorpinel commented 4 years ago

On https://dvc.org/doc/understanding-dvc/core-features

These points have been mentioned several times in the docs before that

Maybe you're onto something here, yes. But let's check on a case-by-case basis. Can you say where each of these points are mentioned in other docs? Also you only listed 3 out of the 6 bullets in that doc, let's check all. You're probably right about some of them at least.

Core features are important to have near the top of introductory docs, so hopefully these other mentions are not buried in very specific areas.

VANRao-Stack commented 4 years ago

Can you say where each of these points are mentioned in other docs? Also you only listed 3 out of the 6 bullets in that doc, let's check all...

I'll mention which bullet and the links where its has been previously mentioned:

jorgeorpinel commented 4 years ago

Sorry for the delay @VANRao-Stack I'm pretty busy this week and weekend. Feel free to submit PRs meanwhile.

  • reproducibility (2) is covered very clearly in the deep dive tutorial
  • large data file versioning (3) is covered in ...#large-dataset-optimization

Those are now introductory docs, except for Get Started. Again: "Core features are important to have near the top of introductory docs, so hopefully these other mentions are not buried in very specific areas." (from https://github.com/iterative/dvc.org/issues/425#issuecomment-634390312)

But maybe this concept is also repeated in a more introductory doc?

  • DVC was agnostic (4) here .../get-started/agenda#agenda

Yes! But I think this note should be added directly to the docs home, https://dvc.org/doc, instead.

DVC's support for cloud storage (6) is clearly explained here: .../get-started/configure#configure

Agree about this one.

You're still missing points 1 and 5...