iterative / dvc

🦉 ML Experiments and Data Management with Git
https://dvc.org
Apache License 2.0
13.58k stars 1.17k forks source link

exp run: showing unhelpful error message for invalid pipeline #8974

Open dberenbaum opened 1 year ago

dberenbaum commented 1 year ago

Background in https://github.com/iterative/dvc/issues/8973.

Here is the setup from that issue:

# train.py
from random import random

from dvclive import Live

NUM_EPOCHS = 10

with Live(save_dvc_exp=True) as live:

    live.log_param("NUM_EPOCHS", NUM_EPOCHS)

    for epoch in range(NUM_EPOCHS):
        step = live.read_step()
        live.log_metric("metric", random())
        live.log_param("my_step", step)
        live.next_step()
# dvc.yaml
stages:
  train:
    cmd: python train.py
    deps:
      - train.py
    outs:
      - dvclive:
          checkpoint: true
    metrics:
      - dvclive/metrics.json
    plots:
      - dvclive/plots

When running dvc exp run, the error message looks like:

216764996-eb2efe78-b5d0-4c47-b79e-1053280e322f

Expected

dvc repro for a similar setup gives an error like:

ERROR: The output paths:
'dvclive'('train')
'dvclive/metrics.json'('train')
overlap and are thus in the same tracked directory.
To keep reproducibility, outputs should be in separate tracked directories or tracked individually.

The same error from dvc repro should be shown in dvc exp run.

dberenbaum commented 1 year ago

@skshetry This is another issue to keep in mind re: #9370