iterative / dvc

🦉 Data Versioning and ML Experiments
https://dvc.org
Apache License 2.0
13.9k stars 1.19k forks source link

repro: support running stage via it's output's name #3875

Open skshetry opened 4 years ago

skshetry commented 4 years ago

We should probably accept an output as a target. It can solve some problems with autocomplete and improve experience. Related #3743 (especially last comments).

$ dvc repro model.pkl

Originally posted by @shcheklein in #3777

efiop commented 3 years ago

looks like moving to collect_granular in https://github.com/iterative/dvc/blob/f09c27e74c0a79c30c2c09a6b82201f4930ba63b/dvc/repo/reproduce.py#L117, should do the trick.

skshetry commented 3 years ago

@efiop, globbing is not possible with collect_granular(). So, either we should use different function based on glob or if we want to make globbing default, we do not support this at all.

efiop commented 3 years ago

@skshetry I didn't mean globbing, just output -> dvcfile mapping.

Vonski commented 3 years ago

I can do this one. Could I take it?

Vonski commented 3 years ago

For now collect_granular, if provided with glob that matches any out files, returns the same PathInfo for all found stages. This PathInfo is created directly from provided glob. Sometimes filter_info from collect_granular is used further and can end up being passed to methods like path_info.isin_or_eq which do not handle globs. If someone in the future try to add glob=True argument to call of collect_granular in places like for example Repo.used_cache or commit method from dvc/repo/commit.py then it may not work as expected, because returned filter_info is used after that and is built from glob.

jorgeorpinel commented 3 years ago

Hi. Should this also apply to status and maybe run?