iterative / studio-support

❓ DVC Studio Issues, Question, and Discussions
https://studio.iterative.ai
16 stars 1 forks source link

Show only commits that belong to branch (e.g. on main branch) #56

Open quantsnus opened 2 years ago

quantsnus commented 2 years ago

When new features or improvements are developed on separate branches and merged therafter, DVC studio shows all the commits for the master branch including all intermediate commits from the feature branch.

My expectation is that the master branch shows only the changes resulting from the merges, e.g. "the released changes", and not all the mess one might have done during development and debugging of the features.

Currently this clutters the table view for the master branch and also the trend for the master looks messy, as the drops and wiggles there are only caused by debugging runs with different data (on the feature branch) and not real experiments or anything that is related to the model progress. It even seems to mix results that happened in parallel on different branches and where merged afterwards.

grafik

Actually, the CLI of DVC behaves as I would have expected. Running dvc exp show -n 9 on the same repository having checked out the master branch just shows me the much sparser history of the merges and the resulting changes in the metrics: grafik

Would it be possible to have a similar visualization or option in DVC studio?

tapadipti commented 2 years ago

@quantsnus If you squash merge your changes, then you'll only see one (merge) commit in the master branch.

If you don't squash them, then the master branch will actually have all these commits, right? I think that if they are hidden, they could cause confusion because they appear in the branch in the repo but not in Studio. What do you think?

@Suor Would Studio be able to identify that these commits have been merged and may not need to be shown separately?

@quantsnus You can also manually hide the extra commits.

Suor commented 2 years ago

It looks to me the most natural way to solve this is squash merging to master or rebasing a branch before merging, removing all undesired commits during that.

As far as I understand experimenting in a branch and dvc exp are alternatives, i.e. dvc exp experiments are not commits in any branch. So dvc exp show doesn't even show commits in any branch, it shows experiments derived from HEAD by default. So this is completely different story.

quantsnus commented 2 years ago

@tapadipti

@quantsnus If you squash merge your changes, then you'll only see one (merge) commit in the master branch.

and @Suor

It looks to me the most natural way to solve this is squash merging to master or rebasing a branch before merging, removing all undesired commits during that.

I considered squashing on merging, but we actually have a reason to not squash on merges, as we want to be able to reproduce every step in many cases. This would be more like a workaround for the situation, and requires significant changes in our workflow.

However, on the high level overview (we potentially like to show to management) all the intermediate steps should be hidden. So, I am desiring a toggle to hide all the commits that originate from a feature development branch, and just show the merge commits.


If you don't squash them, then the master branch will actually have all these commits, right? I think that if they are hidden, they could cause confusion because they appear in the branch in the repo but not in Studio. What do you think?

I think if the commit are hiding, is selectable by the user there should be no confusion. You already have the feature to auto-hide "irrelevant commits", where one could apply the same argument.

@quantsnus You can also manually hide the extra commits.

Yes, but this is very tedious and uncomfortable to do, when it comes to tens or hundreds of commits.


As far as I understand experimenting in a branch and dvc exp are alternatives, i.e. dvc exp experiments are not commits in any branch. So dvc exp show doesn't even show commits in any branch, it shows experiments derived from HEAD by default. So this is completely different story.

That is not how I experienced dvc exp show to work. Yes, it can show all the stuff one did using dvc exp run which are not git commits, as I understood, but also experiments one did using dvc repro and commiting after the run (the latter is effectively an experiment too). We use the latter more frequently than the experiment feature. And dvc exp show does a good job showing the results, while DVC studio doesn't. Like I tried to visualize in the original post.


I'd also like to emphasize one point of my original post a little more:

It even seems to mix results that happened in parallel on different branches and where merged afterwards.

Imagine a scenario where 2 people develop something in parallel.

One fine tuning some hyperparameters (mostly giving high metic results), the other optimizing the code for speed, using a subset of the training data for quick tourn-around of benchmarks (resulting in poor metrics).

If both do this for a week commiting intermittendly and merging then one after the other, I expect the resulting trend graph will be terribly oscillating between the two metric values during this hypothetic development week. Even worse as in the graph I showed above.

tapadipti commented 2 years ago

@quantsnus Thanks for sharing more details. I've created an internal ticket to discuss this. Will keep you updated here.

quantsnus commented 2 years ago

Thanks @tapadipti, I have one more observation for your consideration.

If one hides unwanted commits from the table their metrics still show up in the trend graph. So there is no way, not even manually, cleaning up the trend from failed experiments in the merged main branch.

tapadipti commented 2 years ago

@quantsnus Thank-you for the feedback. The issue with Trends is something we already have in our backlog. We will keep you posted of the status.

tapadipti commented 1 year ago

@quantsnus Now, Trends display only those commits that aren't hidden. Can you pls confirm that it works for you.

quantsnus commented 1 year ago

@tapadipti Thank you, the trend now only shows non-hidden commits, I can confirm that.

I realized, this thread should have been two, because my main request (to automatically hide commits originally from feature branches and only show commits from main branch) is not addressed by this. Are you looking into that one too?

tapadipti commented 1 year ago

@quantsnus we have an internal ticket to look into your original request. But we're not working on it at the moment. I'll ping you here when we work on it.

quantsnus commented 1 year ago

Just for you interest, the scenario I outlined previously just happend this week in real life:

Imagine a scenario where 2 people develop something in parallel.

One fine tuning some hyperparameters (mostly giving high metric results), the other optimizing the code for speed, using a subset of the training data for quick tourn-around of benchmarks (resulting in poor metrics).

If both do this for a week commiting intermittendly and merging then one after the other, I expect the resulting trend graph will be terribly oscillating between the two metric values during this hypothetic development week.

grafik