iterative / dvc

🦉 ML Experiments and Data Management with Git
https://dvc.org
Apache License 2.0
13.36k stars 1.16k forks source link

plots: Multiple fields result in invalid dvc.yaml #10438

Closed mattlbeck closed 1 month ago

mattlbeck commented 1 month ago

Bug Report

Description

Most of the examples given by https://dvc.org/doc/user-guide/project-structure/dvcyaml-files#available-configuration-fields do not appear to result in valid yaml with error: "expected str in ... -> y"

Reproduce

stages:
  train:
    cmd: echo "train stage"
    plots:
      - plot.csv:  
          y: [A, B]
> dvc exp run dvc.yaml
'./dvc.yaml' validation failed.                                       

expected str, in stages -> train -> plots -> 0 -> plot.csv -> y, line 6, column 14
  5 │     - plot.csv:                                                                                                                                                                                                                                                                                    
  6 │   │     y: [A, B]  

To get a valid dvc.yaml you can modify to y: A

Expected

Valid yaml, as per the documented examples

Environment information

DVC version: 3.50.1 (conda)
---------------------------
Platform: Python 3.12.3 on macOS-14.3.1-arm64-arm-64bit
Subprojects:
        dvc_data = 3.15.1
        dvc_objects = 5.1.0
        dvc_render = 1.0.2
        dvc_task = 0.4.0
        scmrepo = 3.3.2
Supports:
        http (aiohttp = 3.9.5, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.9.5, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2024.3.1, boto3 = 1.34.34)
Config:
        Global: /Users/mattb/Library/Application Support/dvc
        System: /Library/Application Support/dvc
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: s3
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc, git
Repo.site_cache_dir: /Library/Caches/dvc/repo/ac355dc86b46032b36e0464bd276ecf1

Additional output with --verbose

2024-05-23 23:24:14,932 ERROR: './dvc.yaml' validation failed: expected str for dictionary value @ data['stages']['train']['plots'][0]['plot.csv']['y']
Traceback (most recent call last):
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/dvc/utils/strictyaml.py", line 268, in validate
    return schema(data)
           ^^^^^^^^^^^^
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/voluptuous/schema_builder.py", line 281, in __call__
    return self._compiled([], data)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/voluptuous/schema_builder.py", line 625, in validate_dict
    return base_validate(path, data.items(), out)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/voluptuous/schema_builder.py", line 458, in validate_mapping
    raise er.MultipleInvalid(errors)
voluptuous.error.MultipleInvalid: expected str for dictionary value @ data['stages']['train']['plots'][0]['plot.csv']['y']

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/dvc/cli/__init__.py", line 211, in main
    ret = cmd.do_run()
          ^^^^^^^^^^^^
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/dvc/cli/command.py", line 27, in do_run
    return self.run()
           ^^^^^^^^^^
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/dvc/commands/experiments/run.py", line 14, in run
    self.repo.experiments.run(
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/dvc/repo/experiments/__init__.py", line 354, in run
    return run(self.repo, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/dvc/repo/__init__.py", line 58, in wrapper
    return f(repo, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/dvc/repo/experiments/run.py", line 77, in run
    return repo.experiments.reproduce_one(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/dvc/repo/experiments/__init__.py", line 125, in reproduce_one
    self.queue_one(exp_queue, **kwargs)
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/dvc/repo/experiments/__init__.py", line 136, in queue_one
    return self.new(queue, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/dvc/repo/experiments/__init__.py", line 218, in new
    return queue.put(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/dvc/repo/experiments/queue/workspace.py", line 38, in put
    return self._stash_exp(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/dvc/repo/experiments/queue/base.py", line 325, in _stash_exp
    self._stash_commit_deps(*args, **kwargs)
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/dvc/repo/experiments/queue/base.py", line 384, in _stash_commit_deps
    self.repo.commit(
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/dvc/repo/__init__.py", line 58, in wrapper
    return f(repo, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/dvc/repo/commit.py", line 57, in commit
    for info in self.stage.collect_granular(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/dvc/repo/stage.py", line 396, in collect_granular
    (out,) = self.repo.find_outs_by_path(target, strict=False)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/dvc/repo/__init__.py", line 541, in find_outs_by_path
    outs = outs or self.index.outs_graph
                   ^^^^^^^^^^
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/funcy/objects.py", line 25, in __get__
    res = instance.__dict__[self.fget.__name__] = self.fget(instance)
                                                  ^^^^^^^^^^^^^^^^^^^
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/dvc/repo/__init__.py", line 282, in index
    return Index.from_repo(self)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/dvc/repo/index.py", line 330, in from_repo
    for _, idx in collect_files(repo, onerror=onerror):
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/dvc/repo/index.py", line 90, in collect_files
    index = Index.from_file(repo, file_path)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/dvc/repo/index.py", line 356, in from_file
    stages=list(dvcfile.stages.values()),
                ^^^^^^^^^^^^^^
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/funcy/objects.py", line 25, in __get__
    res = instance.__dict__[self.fget.__name__] = self.fget(instance)
                                                  ^^^^^^^^^^^^^^^^^^^
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/dvc/dvcfile.py", line 313, in stages
    return self.LOADER(self, self.contents, self.lockfile_contents)
                             ^^^^^^^^^^^^^
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/funcy/objects.py", line 25, in __get__
    res = instance.__dict__[self.fget.__name__] = self.fget(instance)
                                                  ^^^^^^^^^^^^^^^^^^^
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/dvc/dvcfile.py", line 298, in contents
    return self._load()[0]
           ^^^^^^^^^^^^
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/dvc/dvcfile.py", line 140, in _load
    return self._load_yaml(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/dvc/dvcfile.py", line 151, in _load_yaml
    return strictyaml.load(
           ^^^^^^^^^^^^^^^^
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/dvc/utils/strictyaml.py", line 296, in load
    validate(data, schema, text=text, path=path, rev=rev)
  File "/Users/mattb/micromamba/envs/deepfake-universal/lib/python3.12/site-packages/dvc/utils/strictyaml.py", line 270, in validate
    raise YAMLValidationError(exc, path, text, rev=rev) from exc
dvc.utils.strictyaml.YAMLValidationError: './dvc.yaml' validation failed
dberenbaum commented 1 month ago

plots in those examples is a top-level key, not a subkey under a stage. So your example should look like:

stages:
  train:
    cmd: echo "train stage"
    outs:
      - plot.csv
plots:
  - plot.csv:  
      y: [A, B]

We have kept simple plots supported under stages for convenience and to avoid breaking old dvc.yaml files, but to support more flexible plots, we had to separate them from stage outputs (for example, plots can now be defined with outputs from multiple stages).

mattlbeck commented 1 month ago

I see, makes sense. Thanks for the clarification!