Open itcarroll opened 2 years ago
I'm just curious why would you leave config.yaml.dev
untracked by Git
. Because this would break your repository when you share it with others using Git
. It makes this file be some kind of local parameter?
In a repo
we have Git
tracked files(code, repo config), DVC
tracked files(model, data...etc), and files neither tracked by Git
nor been tracked by DVC
( some sensitive config or local config?).
I am not sure I would call it a bug. But the fact that we need to load params to clean up the cache sounds like not something necessary.
@pared @itcarroll, DVC treats dvc.yaml
as a single source of truth. So the file paths that are being tracked are read from dvc.yaml
rather than from dvc.lock
. Later it just reads the hashes for those file-paths in dvc.lock
and builds a index, which it will use to delete files (in case of gc
). In a way, dvc.lock
is a database for metadata for entries in dvc.yaml
.
This allows us to maintain a simpler and unified model where we can treat dvc.yaml
and dvc.lock
entries as being one, and avoid need for reconciliation.
@karajan1001 Yes, maybe ".local" would be a better suffix than ".dev"
Given what @skshetry said, I realize there's a more straightforward way to hit this same error. Simply forget to add your parameters file to the commit where you add a reference to it in the dvc.yaml file. Now you've broken gc --all-commits
with no way to fix it by adding a new commit. I'd call that a bug, but up to y'all.
@dberenbaum I guess to some extend #6150 ?
Bug Report
Description
When "dvc.yaml" references a parameter file in order to reproduce a foreach stage, then
dvc gc --all-commits
will error if the parameter file does not exist in a commit containing this "dvc.yaml" version. The use case is not too convoluted: it involves a development version of a parameters file that is not committed, but used in "dvc.yaml" during development where it is easy to accidentally commit. @pared Suggested this may be a bug if I could reproduce.Reproduceo
Initialize git and dvc. Add the following three files:
Run
dvc repro
thengit add . && git commit -m "only commit"
.To generate the following error, run
dvc gc --all-commits -v
and answer "y" at the prompt.Expected
It is not reasonable for
dvc gc --all-commits
to require a parameter file referenced from a committeddvc.yaml
, when all it should need is advc.lock
. It introduces the situation above where an accidental commit of a reference to a temporary parameters file permanently breaks the ability to garbage collect with--all-commits
.Environment information
Output of
dvc doctor
: