iterative / dvc

🦉 ML Experiments and Data Management with Git
https://dvc.org
Apache License 2.0
13.38k stars 1.16k forks source link

remove `dvc.run` and related tests #9671

Open skshetry opened 1 year ago

skshetry commented 1 year ago

We removed dvc run command in 3.0 and removed support for running .dvc file. But dvc.run() still exists and lots of tests are still using it.

We should remove the use of dvc.run() from tests and gradually get rid of dvc.run() API.

ArshErgon commented 1 year ago

hi @skshetry can you assign it to me? as I understand the we need to remove the run() command but if the tests are like this

stage = dvc.run(
        cmd="python copy.py params.py metrics.py",
        metrics_no_cache=["metrics.py"],
        params=["params.py:INT,FLOAT,DICT,Train,Klass"],
        name="copy-file",
    )

and the return value is using somewhere else, how to handle it? should I replace the dvc.run to dvc.repro?

skshetry commented 11 months ago

Hi @ArshErgon, sorry for the late reply.

dvc run command no longer exists, and there is no 1:1 equivalent in dvc now. There is dvc stage add --run on the command side, and dvc.stage.add() in API side (with no run support). repro runs the whole DAG of stages, whereas run only runs one stage that was created in memory.

One way would be to replace dvc.run() with dvc.stage.add(), but that won't work for all the tests. We'd need to replace them with some APIs that are closer to what's being tested and exists.

I don't want to be discouraging, but this task requires a deeper understanding of dvc's codebase, so this is definitely not a https://github.com/iterative/dvc/labels/good%20first%20issue.

dberenbaum commented 11 months ago

@skshetry This is less user facing. What's your take on the priority?

skshetry commented 10 months ago

I think we can downgrade it to p2, since we have a lot of p1s. But it is important, because a lot of tests depend on dvc.run() which does not exist in real world.