Closed adamliter closed 4 months ago
Have you tried dvc commit
? It force synchronizes your workspace to dvc.lock
without actually running the stages.
@skshetry Ah, thank you! I misunderstood the help text of dvc commit
. It says "Record changes [...] by storing the current versions in the cache." So I assumed this was for only updating things in .dvc/cache
and didn't even think to try it.
But you're right, this does exactly what I want. Thank you.
Edit: It looks like the documentation on the website is much clearer about the use cases, but I did not check there.
The help message needs to be updated. Looking at the git blame
, it hasn't been updated for four years. š
@skshetry Do you think I should reopen this as a request to update the help message for dvc commit
?
Something I occasionally find myself doing is manually updating my
dvc.lock
file if I've made a change to a file that is a dependency of some stage but the change I made wouldn't actually impact the outputs of any of the DVC stages. When this happens, I usually manually update the MD5 hash and size of the file in thedvc.lock
file and then make a commit explaining what I did, and why it wouldn't actually result in a change to any of the stages.Some example scenarios:
train.py
, which we'd definitely want to be in thedeps
section for the DVC stage calledtrain
, but just adding some logging shouldn't invalidate the current outputs).fetch_data
stage go out of date since this.sql
file will be in thedeps
section for that stage, which would, in all other circumstances, be a good thing. But this migration has no change on the data and is outside of our control.Those are just a few scenarios where I've found myself manually updating a
dvc.lock
file. Even though I'd generally want changes to the files I'm tracking asdeps
of certain stages to result in this behavior, there are some cases where I know the changes I've made to adeps
-tracked file should effectively be no-op changes. The feature request is to expose some API for updating thedvc.lock
in such cases instead of having to do it manually.Do you think this is something you'd be open to? Maybe there are some pitfalls I'm not seeing or thinking through. If you're open to it, I'm not sure what a good name for the API would be. Maybe something like
dvc update-lock
that takes a filename as an argument and then replaces the hash and size for that file with its new hash and size anywhere it is found in thedvc.lock
file ... ?