Closed jorgeorpinel closed 4 years ago
Will you please give an example of this for better understanding? @jorgeorpinel
@ryokugyu for example:
How it works - should be reviewed, simplified. made actionable. move to the user guide What is DVC - reviewed, moved to the user-guide
etc.
Can I take this up?
Hi @Naba7 thanks for the interest. Will you be dropping issue #463 after all?
@Naba7 I agree with @jorgeorpinel let's do issues one by one, or at least tow in parallel. Quality is important, not speed here.
Yes, I agree @jorgeorpinel @shcheklein . It will be better to focus on one issue rather than fumbling over many. Sure, I will work on #463 .
I am appending here my ideas about how to reorganize "Understanding DVC", but it is clear that those pages need to be heavily rewritten and updated (not just moved around or relocated on the menu).
User Guide
[ ] Introduction
Why DVC
https://dvc.org/doc/understanding-dvc/collaboration-issues
What is DVC
https://dvc.org/doc/understanding-dvc/what-is-dvc
https://dvc.org/doc/understanding-dvc/what-is-dvc
https://github.com/iterative/dvc.org/issues/53#issuecomment-485588486
Other Resources
https://dvc.org/doc/understanding-dvc/resources
Maybe some of the resources can be referenced from the other sections in the Introduction, and this section be removed.
I am appending here my ideas about how to reorganize "Understanding DVC", but it is clear that those pages need to be heavily rewritten and updated (not just moved around or relocated on the menu)...
This structure looks good :+1: I was wondering whether we can move The Privacy Policy structure as discussed in https://github.com/iterative/dvc.org/pull/987#issuecomment-584407710 and #144(comment) under one of these sections?
Actually that proposal while great when it was done may be outdated right now. The goal now is to reduce the amount of pages and absorb the unique information from Understanding DVC pages into existing User Guide pages as much as possible.
I was wondering whether we can move The Privacy Policy structure as discussed in #987 (comment) and #144(comment) under one of these sections?
Good finds (possibly removing GAPP from footer and from nav sidebar, which is now possible BTW). However, this is all unrelated to this issue.
Also, @srishti-nema as you're already assigned to #540 please focus on that for now.
I am appending here my ideas about how to reorganize "Understanding DVC", but it is clear that those pages need to be heavily rewritten and updated (not just moved around or relocated on the menu)...
I have created a similar indexed version of absorbing DVC. I have incorporated certain facts that make it seem more in place, developing from how ML evolved to asking the right questions now. @jorgeorpinel, @shcheklein, if it's okay, should I create a PR?
@VANRao-Stack sounds cool! could you outline the changes first?
@VANRao-Stack sounds cool! could you outline the changes first?
Yeah, definitely! I began with creating a two subsections within understanding the problem, where I began with issues in current ML practices, and then asking the right questions in order to resolve the same. I provided a couple of historical pointers to show how it was a similar thinking that lead to the growth of ml/dl in the first place.
NOTE: I did not change the questions posed, as I thought they were sufficient here.
I then created another section, containing two other sub-sections, with one explaining about the technologies currently in the market (comparison to existing technologies), then followed by a section describing what Experiment software Management meant.
NOTE: I'm still not sure about existing technologies and am having a little difficulty editing it. I don't think this section should be placed here, but instead towards the end right before core features or before Resources, I could instead here make a section on why technology today isn't good enough for development.
The next is a section on what DVC is and a couple of its key features, and as mentioned in #53, I noticed that there wasn't a keyword on DVC files and remote, so I added the same.
The next is core features, and as I mentioned in the previous note, I think we can move the exiting technologies here as clearly, a comparison with the existing technologies would be suitable here along with a couple of extra notes.
And finally how-it-works and resources, both of which I kept unedited, since a change I don't think was in order.
@VANRao-Stack while some of the things you mention seem to make sense, I don't think your message outlines the suggested changes.
Can you try to describe the changes you are proposing in a very clear way? A list, a table, etc. something visual and easy to grasp as opposed to a recollection of your thought process combined with specific actions. Please help us understand your plan here 🙂
Thanks!
@VANRao-Stack while some of the things you mention seem to make sense, I don't think your message outlines the suggested changes. Can you try to describe the changes you are proposing in a very clear way? A list, a table, etc. something visual and easy to grasp as opposed to a recollection of your thought process combined with specific actions. Please help us understand your plan here 🙂 Thanks!
Yeah sure!
Started with collaboration issues, review it and place it in the beginning after giving it a little more description with respect to the context.
Move on to asking the right question, similarly review it and add or delete content with respect to the context.
Next section, show how the current technologies aren't enough
After a brief description of what an experiment management software is, formally define what dvc is, add the rest of keywords taken from #53 here.
Another round of comparison between the existing technologies and dvc, and briefly explain its core features.
How it works section was asked to be reviewed and made more actionable. I do not understand this, so I have left it as it is for now.
The rest of the user guide remains pretty much the same, with an extra section on Resources at the very end.
I hope this clearly outlines my idea, if this isn't what was planned please let me know about it, any suggestions or opinions are also welcome.
Thanks @VANRao-Stack. I still don't see a specific plan of what parts of Understanding DVC will go where in other existing doc sections. We'd like to see such a "map" as an initial plan. You also seem to have the intention to improve some of the content in question, which isn't bad, but that's not within the scope of this task, just deciding which parts to preserve, moving them, and changing their wording as needed to fit the new context...
How about this, just focus on one page (for example https://dvc.org/doc/understanding-dvc/collaboration-issues) and try to detail where you would move all that content into other sections of the docs, of which parts to just remove and why. You can submit a PR directly if that's easiest for you, or just post another comment with the plan here.
Thanks @VANRao-Stack. I still don't see a specific plan of what parts of Understanding DVC will go where in other existing doc sections. We'd like to see such a "map" as an initial plan. You also seem to have the intention to improve some of the content in question, which isn't bad, but that's not within the scope of this task, just deciding which parts to preserve, moving them, and changing their wording as needed to fit the new context... How about this, just focus on one page (for example https://dvc.org/doc/understanding-dvc/collaboration-issues) and try to detail where you would move all that content into other sections of the docs, of which parts to just remove and why. You can submit a PR directly if that's easiest for you, or just post another comment with the plan here.
Oh, I had actually used the plan of where everything would be moved according to a previous comment by @dashohoxha that has been marked as resolved right now. I will create a PR so you can better see what I intended.
I see. Yeah, sorry that plan is really old. I mentioned we needed a new strategy in https://github.com/iterative/dvc.org/issues/425#issuecomment-631716646 above. Again, please focus on a single page of Understanding DVC for now and lmk when your plan for that content is ready to check. Thanks
Alright, since I am only supposed to work with one page, I propose we move related technologies to the end of the use cases section. After a couple of articles on workflows and other such scenarios, comparing it to related technologies would directly show how DVC brings something new to the table.
@jorgeorpinel , @shcheklein What do you think?
@VANRao-Stack I think it might be confusing to be honest. My take on this that it should become a part of the User Guide in some way. But the first question to answer - what value do expect/why do we have this section in the first place?
@VANRao-Stack I think it might be confusing to be honest. My take on this that it should become a part of the User Guide in some way. But the first question to answer - what value do expect/why do we have this section in the first place?
Well the answer to the question you posed about why we have this section in the first place is, I think, with a first glance DVC kind of seems just like another tool like Git, but however it isn't, and this section shows how DVC differs from the existing technologies . The whole point of having a use case section was to make people familiar with the different cases in which DVC could be deployed, hence having a comparison between the existing technologies and DVC would help make this point firm in the readers' minds.
move related technologies to the end of the use cases section. After a couple of articles on workflows and other such scenarios...
@VANRao-Stack Use Cases already talk about the benefits of DVC so I'm not sure about your reason either. Besides, this issue specifies the goal is to absorb Understanding DVC into Get Started or (most likely) User Guide.
what value do expect/why do we have this section in the first place?
This is a tricky question actually. The reason for Understanding DVC as a whole was probably originally something that is no longer a priority because it's covered by other sections like Get Started (or the glossary/ proposed Basic Concepts) better.
The Related Technologies page seems to me like it had the intention to attract users from those tools who find them limited when dealing with data, or/and to answer FAQ's like "isn't this just like Git-LFS?". Another message sent by that page, as Abhijith mentioned, is a summary of the features that DVC covers, which currently would require a bunch of other tools to put together. So perhaps this one page could be moved into a new User Guide indeed, titled something like "Migrating from Related Tools" or absorbed into the proposed How-to section (see #899 epic).
I think this discussions reveals that at least parts of this issue are not very introductory so I removed the good-first-issue
label. Maybe you should try another one @VANRao-Stack... sorry for the back and forth. If you can find something much more defined, with a specific description that is crystal clear as to what should be done, that would be best for now I think.
OR if you want a small subtask that helps advance this issue, how about this:
For example the first bullet in https://dvc.org/doc/understanding-dvc/how-it-works is pretty much covered in a few places: https://dvc.org/doc/tutorials/get-started/initialize and https://dvc.org/doc/command-reference/init
That should be a well-scoped and relatively easy PR to review.
So I was working on the Core Features page, as you had suggested. Clearly most of the points mentioned is redundant:
DVC works on top of Git repositories and has a similar command line interface and Git workflow.
DVC is Programming language agnostic: Python, R, Julia, shell scripts, etc. as well as ML library agnostic: Keras, Tensorflow, PyTorch, Scipy, etc.
DVC supports cloud storage (Amazon S3, Microsoft Azure Blob Storage, Google Cloud Storage, etc.) for data sources and pre-trained model sharing.
These points have been mentioned several times in the docs before that, the only new point that I'm seeing is probably, DVC is open source and self serve. What should I do here? Because if I delete everything that's redundant here, it would pretty much make the entire page empty. @jorgeorpinel
On https://dvc.org/doc/understanding-dvc/core-features
These points have been mentioned several times in the docs before that
Maybe you're onto something here, yes. But let's check on a case-by-case basis. Can you say where each of these points are mentioned in other docs? Also you only listed 3 out of the 6 bullets in that doc, let's check all. You're probably right about some of them at least.
Core features are important to have near the top of introductory docs, so hopefully these other mentions are not buried in very specific areas.
Can you say where each of these points are mentioned in other docs? Also you only listed 3 out of the 6 bullets in that doc, let's check all...
I'll mention which bullet and the links where its has been previously mentioned:
The second bullet about reproducibility (2) is covered very clearly in the deep dive tutorial : https://dvc.org/doc/tutorials/deep/reproducibility#reproducibility
Similarly, the large data file versioning (3) is covered in https://dvc.org/doc/user-guide/large-dataset-optimization#large-dataset-optimization
Its was mentioned that DVC was agnostic (4) here https://dvc.org/doc/tutorials/get-started/agenda#agenda
DVC's support for cloud storage (6) is clearly explained here: https://dvc.org/doc/tutorials/get-started/configure#configure
Sorry for the delay @VANRao-Stack I'm pretty busy this week and weekend. Feel free to submit PRs meanwhile.
- reproducibility (2) is covered very clearly in the deep dive tutorial
- large data file versioning (3) is covered in ...#large-dataset-optimization
Those are now introductory docs, except for Get Started. Again: "Core features are important to have near the top of introductory docs, so hopefully these other mentions are not buried in very specific areas." (from https://github.com/iterative/dvc.org/issues/425#issuecomment-634390312)
But maybe this concept is also repeated in a more introductory doc?
- DVC was agnostic (4) here .../get-started/agenda#agenda
Yes! But I think this note should be added directly to the docs home, https://dvc.org/doc, instead.
DVC's support for cloud storage (6) is clearly explained here: .../get-started/configure#configure
Agree about this one.
You're still missing points 1 and 5...
Files currently located in
/static/docs/understanding-dvc/
Remove all the Understanding DVC docs and just rescue all that is unique and valuable into the get started and user guide docs.
UPDATES: