Open sisp opened 1 year ago
@sisp The problem of depending on executables is fairly complex overall and we don't have a perfect solution for it. Even detecting changes is challenging as it is not clear where to stop (e.g. main script or all libraries and how deep?). That's why we just suggest specifying your script as a dependency if you think it is suitable.
[...] it is not clear where to stop (e.g. main script or all libraries and how deep?).
See #9195 on that topic. In short, DVC should rather err on the side of too much computation than on the side of false cache hits. It's a tradeoff between efficiency and correctness, but correctness certainly outweighs efficiency. But efficiency could be improved in the future.
That's why we just suggest specifying your script as a dependency if you think it is suitable.
I think that's not sufficient because especially in CI I cannot force-run a stage ad hoc and false cache hits will lead to incorrect results. If DVC supported executables in the search path via deps
, then the caching behavior would be the same as for script paths with the current cache key computation (the cache key is the content hash of the executable). So there would be no disadvantage in adding support for executables in the search path. And with #9195 implemented, the cache key for Python- based executables could be extended by taking into account the import tree in the same way as it would be done for Python scripts.
@sisp I think the same workaround is possible as I described in the second ticket. You can try to introduce a stage that runs a custom function / script wit the only single purpose - calculate different hashes in a way you want, spits them into a file that your main stage then depends on.
Bug Report
Description
DVC doesn't seem to support specifying an executable found in the search path
$PATH
as a stage dependency which means the stage won't rerun even when the executable has changed.Stage commands may not always be scripts but also other kinds of executables. They may be locally developed or installed via a third-party package. For instance, I may want to train a YOLO model in one of my DVC stages using the
yolo
executable of theultralytics
package, so my stage command would be a call of that executable (found in the search within the virtual environment into which I've installultralytics
) and not a local script. When theyolo
executable changes (in fact, the relevant code — related to #9195) because I've updated theultralytics
package, I'd like the stage to rerun.Reproduce
dvc init
../bin/hello
with the following content:Then, make it executable:
./bin
to the search path:dvc.yaml
with the following content:dvc repro
and observe the following error:When I omit the
deps
block, then unsurprisingly the stage won't be rerun after I change the executable:Expected
It should be possible to declare an executable as a dependency via the
deps
field.Environment information
Output of
dvc doctor
:Additional Information (if any):
A possible solution to the problem might the extension of the
deps
syntax like this: