iterative / vscode-dvc

Machine learning experiment tracking and data versioning with DVC extension for VS Code
https://marketplace.visualstudio.com/items?itemName=Iterative.dvc
Apache License 2.0
192 stars 28 forks source link

Story: Compare other branches and commits in plots and the experiments table #1966

Closed dberenbaum closed 1 year ago

dberenbaum commented 2 years ago

In the Plots and Experiments views, is it possible to compare the workspace or experiments to past commits or branches? Right now, it seems limited to experiments based on the current commit. For example, in plots it would be nice to be able to enter any git ref here:

Screen Shot 2022-07-01 at 11 03 22 AM
DavidGOrtega commented 2 years ago

Im preparing also an issue regarding to branches 😋

mattseddon commented 2 years ago

Close to a duplicate of #456. I think we only need one of these issues.

dberenbaum commented 2 years ago

I don't think we should be constrained by existing CLI flags here like --all-branches etc. VS Code might be able to provide way more flexibility to pick and choose specific refs from the git tree within the UI. I think this type of functionality could be broadly useful not just in this feature, but in other features like removing unwanted experiments or data.

sroy3 commented 2 years ago

2392 Displays the previous two commits. Next step could be have a "show more commits" at the bottom of the table. It could add two (we can easily fine tune that to the number we want) more commits until there are no more to show.

daveraghav commented 1 year ago

Is there a plan to show more than 2 previous commits to vsc-extension experiments table anytime soon?

shcheklein commented 1 year ago

@daveraghav yes! just as a matter of getting feedback - how many commits you would prefer to see? do you need to see branches?

daveraghav commented 1 year ago

Thanks for the update @shcheklein. Ideally the more the better. Either using 'Load More Commits' or something similar as mentioned @sroy3 or by making it configurable in extension settings. Also showing experiments from other branches would be great as multiple data scientists often work across parallel branches on the same project.

shcheklein commented 1 year ago

Also showing experiments from other branches would be great as multiple data scientists often work across parallel branches on the same project.

Thanks, yes. That's exactly why I'm thinking about allowing people to pick and choose in some way what they want to see (vs trying to present all commits and all branches - too expensive, and too much info). I'm thinking what would be the best way to find /add commits to show. We can introduce a control like this:

By default the ribbon would always have the "current branch" + 2 commits in it.

dberenbaum commented 1 year ago

I guess the idea is to expand in the experiments table, and then those will be automatically shown as options in the plots view?

My take regarding the table:

  • Dropdown to pick a branch, tag, commit sha

I think picking from branches would be enough to start and less overwhelming for users.

2392 Displays the previous two commits. Next step could be have a "show more commits" at the bottom of the table. It could add two (we can easily fine tune that to the number we want) more commits until there are no more to show.

Agree with @sroy3 that "show more" is simpler and more intuitive over having a configurable number of commits per branch/item.

lainisourgod commented 1 year ago

Plus to this story. I'm confused on why now I only see 2 previous commits and experiments on them, without any option to show the other.

shcheklein commented 1 year ago

@lainisourgod we are working on this! are you interested in seeing other branches? any thoughts about the interface for this?

lainisourgod commented 1 year ago

I'm new to all experiments thing so my workflow can be a little bit incorrect to philosophy of dvc.\

However, what I'd really want is to be able to choose from all experiments and to filter to see only some of them. I have a bunch of metrics on every run, and sometimes want to show in one table a set of experiments (e.g. baseline, added feature 1, added feature 2, added feature 1 and 2) so I can easily compare them.

In this case, experiments from other branches are can be useful too, because I can add different features in different branches.

Also I'd love to have a somewhat powerful search menu to see both commit messages, hashes and exp names.

lainisourgod commented 1 year ago

Also, little aside from that, it'd be nice to reproduce dvc exp diff functionality in Experiments view. E.g. set some experiment as a baseline, and for other experiments to show diffs in metrics and params instead of an absolute value

sroy3 commented 1 year ago

DVC has an option to show all branches dvc exp show --all-branches or dvc exp show -a. When using this flag, it is impossible to set the number of commits to show at the same time. Here is the display of dvc exp show -n 3 -a:

Screenshot 2023-03-30 at 11 05 31 AM

The simplest way to include the branches in the experiments table (first step) would be to have a toggle to switch between regular commits view vs. branches view. Here is a quick test of how this could look:

Screenshot 2023-03-30 at 11 15 13 AM

The "Switch to Branch View" button will remove the "Previous Commits" rows, run dvc exp show -a underneath. "Show More Commits" and "Show Less Commits" would be disabled and "Switch to Branch View" would be replaced by "Switch to Commits View". We can probably experiment a little with the style and placement (the previous image was created as a visual aid more than a mockup). Would that work for everyone as a first step?

shcheklein commented 1 year ago

DVC has an option to show all branches dvc exp show --all-branches or dvc exp show -a. When using this flag, it is impossible to set the number of commits to show at the same time. Here is the display of dvc exp show -n 3 -a:

yes, we need some DVC support for this. Most likely an option to pass an arbitrary commits (revs) + -n to get their history. That is general enough to support all the cases I think. Let's create a ticket on the DVC side and/or contribute if needed. cc @dberenbaum .

Would that work for everyone as a first step?

Could you clarify which branch it would be switching to? how can we show multiple branches? Does it mean that we run two dvc exp show commands?

I think eventually we want this to be single table with multiple sections per branch / tags, etc. It's needed to being able to compare things and plot things together from different branches.

dberenbaum commented 1 year ago

yes, we need some DVC support for this. Most likely an option to pass an arbitrary commits (revs) + -n to get their history. That is general enough to support all the cases I think. Let's create a ticket on the DVC side and/or contribute if needed. cc @dberenbaum .

As discussed, I hope this is not a large effort, but I wonder especially with the results soon being cached if it's better to make separate calls to exp show as a first step. WDYT?

Edit: It's not just about saving dev time on DVC but also about waiting to figure out what UI we really want.

shcheklein commented 1 year ago

I think it's clear that we want to see one table with multiple commits that belong to different branches pretty much (I can't come up with an alternative to this, but I would happy to hear thoughts). No matter if we show some history or not, etc - I think it doesn't change much - we'll need some support from DVC, and most likely it will in some form similar to what I described. I don't see a large risk implementing that since I hope it's not a large effort and we can change params a bit if needed as we go.

if it's better to make separate calls to exp show as a first step. WDYT?

It's definitely an extra effort that we'll need to replace later for sure (every additional command is more fragility + more overhead + lock contentions in some cases, etc, etc). How much? I don't know @sroy3 @mattseddon any thoughts?

dberenbaum commented 1 year ago

I guess what I'm wondering is whether it's easier/better to have one giant JSON with everything vs. independent JSON for different branches/commits that could be partially updated, shown in separate tables, etc.

shcheklein commented 1 year ago

Good question, Dave. From my past observations/experience (and that was my assumption) - it's always better to minimize the number of commands that we run (less fragile, faster, etc) + I think implementing these flexible, partial updates, or UI that supports that can also be on a different level of complexity. Again, would love to hear other opinions / thoughts on this.

sroy3 commented 1 year ago

Would that work for everyone as a first step?

Could you clarify which branch it would be switching to? how can we show multiple branches? Does it mean that we run two dvc exp show commands?

It's actually all branches. Just like the DVC output I've posted.

shcheklein commented 1 year ago

It's actually all branches. Just like the DVC output I've posted.

I see. I think it can become very expensive and noisy tbh. I think we need to have a way to pick a brach(es) and show them. Implementation-wise if it's not a huge effort we can go with multiple commands, eventually migrate to a single one.

sroy3 commented 1 year ago

It's actually all branches. Just like the DVC output I've posted.

I see. I think it can become very expensive and noisy tbh. I think we need to have a way to pick a brach(es) and show them. Implementation-wise if it's not a huge effort we can go with multiple commands, eventually migrate to a single one.

There are currently no commands to show only one branch (unless I've misread the docs). I don't think it'd be that expensive. There aren't usually that many branches in a project. There could potentially be an almost infinite number of commits, but the number of branches should stay relatively low.

shcheklein commented 1 year ago

Okay, it can simplify a bit the initial step (no need a mechanism to pick a branch). We still want to show multiple commits per branch + we'll need almost as a next step a way to hide branches that are not relevant (I have repos with 10+ branches, in ML people use branches in some cases extensively for experimentation, etc). So, I'm not sure we are saving anything - it's just a bit different approach, but we'll end up in the same place - ability to pick which branches / tags to show + some history in some cases + show it within a single table.

daavoo commented 1 year ago

There are currently no commands to show only one branch (unless I've misread the docs).

The --rev option can be used to show only experiments derived from one branch. For example dvc exp show --rev try-large-dataset in https://github.com/iterative/example-get-started

dberenbaum commented 1 year ago

@sroy3 Just a note to please not worry too much about what dvc exp show does. If you have other ideas for UI/UX, I'm sure we can either build it into dvc exp show or find some creative solution to show the right rows.

mattseddon commented 1 year ago

@sroy3 Just a note to please not worry too much about what dvc exp show does. If you have other ideas for UI/UX, I'm sure we can either build it into dvc exp show or find some creative solution to show the right rows.

Especially with https://github.com/iterative/dvc/pull/9170 getting integrated soon.

sroy3 commented 1 year ago

Branches view was released with 0.7.1. Here is what it looks like: https://www.loom.com/share/50307b14f4c04f429ee5570f1dcc93ee

Currently working on adding individual branches with previous commits.

sroy3 commented 1 year ago

With https://github.com/iterative/dvc/issues/9390 being closed, we'll be able to use one dvc call instead of one per branch.

sroy3 commented 1 year ago

3827 is currently in review.

Since that PR touches a lot of the data, I'll wait until this gets merged before implementing the multiple branches in a single call (https://github.com/iterative/dvc/pull/9391#issuecomment-1531799956).

I'll also do any follow-ups linked to user experience first, as the new call is just a convenience for us and not visible to the user.

Some follow-ups that I currently have on my list are:

There will probably be more after the review, or feel free to add more.