iterative / dvc

🦉 ML Experiments and Data Management with Git
https://dvc.org
Apache License 2.0
13.45k stars 1.17k forks source link

Auto push experiments at end of each stage #8843

Open dberenbaum opened 1 year ago

dberenbaum commented 1 year ago

Auto-pushing checkpoints was introduced to make it easier to recover long-running model training jobs in CI. For long-running processing jobs over multiple pipeline stages, the same behavior should be available at the end of each stage in the pipeline.

sukhovvl commented 1 year ago

We are very interested in this feature. We run long on-commit dvc pipelines in CI, by the means of dvc repro and in cases they fail we currently have to rerun everything from scratch. It would be great if intermediate results were downloadable from the remote dvc cache.

sukhovvl commented 1 year ago

Furthermore, we experimented a bit with cloud parallelisation of pipeline stages, i.e. a stage that looks like a normal stage for dvc, actually starts various cloud jobs. It would be great if there was a way for those jobs to call dvc pull and get the intermediate results of the previous stages. Leaving for a moment aside the question of how to transfer dvc.lock file to the remote workers and how to funnel back the results of the stages, it feels like intermediate pushes would open many workarounds for these cases. Of course it might seem like a far fetched scenario, but maybe it's another case in point in favour of this feature.

cateseale commented 6 months ago

+1 for this feature