iterative / dvc

🦉 Data Versioning and ML Experiments
https://dvc.org
Apache License 2.0
13.78k stars 1.18k forks source link

In monorepo `dvc exp remove -A` should remove only experiments within the sub-dir project scope #10241

Open mnrozhkov opened 9 months ago

mnrozhkov commented 9 months ago

Summary / Background

I'm testing DVC Experiments for monorepo scenario. I encountered unexpected behavior for dvc exp remove

Within a mono repo say we have:

- / 
  - project_a
  - project_b
  - root_content

When working inside project_a, I list experiments

dvc exp list  

and get 2 experiments I ran for project_a

main:                                                                 
        5a4cba8 [paled-flus]
        81c2de9 [tippy-scut]

Then, I want to remove all experiments for project_a with

dvc exp remove -A

DVC removes all experiments for all projects in the repo

Removed experiments: 'finer-limb', 'perdu-vase', 'paled-flus', 'heady-mate', 'older-tipi', 'dusty-tang', 'alpha-gyms', 'downy-kiwi', 'moved-bomb', 'butch-iglu', 'silly-fibs', 'olive-roam', 'bosom-curb', 'unlet-soja', 'tippy-scut', 'sassy-dawn', 'braky-baby', 'coaly-kill', 'moldy-moot', 'pappy-gest', 'split-dogs', 'elite-bort', 'shock-dado', 'boozy-bade', 'pucka-thaw', 'mossy-jird', 'splay-tosh', 'famed-afro', 'finer-torc', 'bijou-yolk', 'fetid-mope', 'tangy-trio', 'legal-ludo', 'cagey-sech', 'addle-chic', 'eerie-barb', 'noisy-rods', 'sarky-joey', 'older-jest', 'umber-tote', 'sable-moit', 'aging-doge', 'puffy-esse' and 'power-harl'

Expected behavior

dvc exp remove -A should remove only experiments within project_a scope

dberenbaum commented 9 months ago

Thanks for the report!

DVC CLI does not do any monorepo slicing at the moment -- that is limited to Studio. The reason you only saw 2 experiments in dvc exp list is because you did not include -A for that command. All dvc exp commands will look at the entire repo.

dberenbaum commented 9 months ago

Looks like https://github.com/iterative/dvc/issues/10244#issuecomment-1899063451 is related.

I didn't notice at first that these are entirely different dvc projects with project_a/.dvc and project_b/.dvc. @pmrowla Thoughts on how dvc exp commands should handle this scenario? See the link above, where it looks like we end up pushing two copies of each experiment from each project.

pmrowla commented 9 months ago

I'm not sure we are actually pushing two copies of the experiment, I'm guessing that there are actually two separate experiments with the same name being generated.

git refs (and exps) apply to the entire repository. If we actually need to separate exps by subrepo, then we need to extend the exp ref namespace to differentiate them in the dvc init --subdir case

so the exp refs would go somewhere like

refs/exps/<git-sha>/subrepo/path/to/subdir/exp-name