In a monorepo scenario with a .dvc directory at the root of te monorepo and multiple subdirectory projects (each with their own dvc.yaml file), dvc repro seems to be checking the entire monorepo even when explicitly given a dvc.yaml file from a subdirectory (and even when run from that subdirectory). I am not sure why it does that but with a particularly large monorepo this can slow things down considerably. For example, with the example repo below when set to 1000 projects this increases the time to run simple experiments from about 2 seconds to about 24 seconds (1000 projects is a lot but they are very simple and their directory structure is also).
Even if the other directories don't have a dvc.yaml file in them at all, dvc repro is still trying to collect stages from there (whereas I would expect it not to even look outside of the PWD).
With dvc exp run the pattern is the same, only a bit more is going on there since the command does more than just dvc repro
Reproduce
There is a testing repo here with instructions on how to test this and reproduce the issue in the README.
Expected
I would be expecting dvc repro to only scan the PWD of the dvc.yaml file (and its subdirectories) and not go through the entire directory tree. The same for dvc exp run.
Additional Information (if any):
Here are some logs that I generated with verbose runs of dvc repro and dvc exp. The first two are outputs when this is run from a single project in a monorepo with 5 projects in total (all of them with their own dvc.yaml). The last one is run in a monorepo with 2 projects, one of which does not contain any dvc.yaml file at all
Bug Report
Description
In a monorepo scenario with a
.dvc
directory at the root of te monorepo and multiple subdirectory projects (each with their owndvc.yaml
file),dvc repro
seems to be checking the entire monorepo even when explicitly given advc.yaml
file from a subdirectory (and even when run from that subdirectory). I am not sure why it does that but with a particularly large monorepo this can slow things down considerably. For example, with the example repo below when set to 1000 projects this increases the time to run simple experiments from about 2 seconds to about 24 seconds (1000 projects is a lot but they are very simple and their directory structure is also).Even if the other directories don't have a
dvc.yaml
file in them at all,dvc repro
is still trying to collect stages from there (whereas I would expect it not to even look outside of the PWD).With
dvc exp run
the pattern is the same, only a bit more is going on there since the command does more than justdvc repro
Reproduce
There is a testing repo here with instructions on how to test this and reproduce the issue in the README.
Expected
I would be expecting
dvc repro
to only scan the PWD of thedvc.yaml
file (and its subdirectories) and not go through the entire directory tree. The same fordvc exp run
.Additional Information (if any):
Here are some logs that I generated with verbose runs of
dvc repro
anddvc exp
. The first two are outputs when this is run from a single project in a monorepo with 5 projects in total (all of them with their owndvc.yaml
). The last one is run in a monorepo with 2 projects, one of which does not contain anydvc.yaml
file at alldvc_repro.log dvc_exp_run.log dvc_exp_run_projects_wo_dvc.log