iterative / dvc

🦉 Data Versioning and ML Experiments
https://dvc.org
Apache License 2.0
13.96k stars 1.19k forks source link

Exp run: reformats dvc.yaml #10154

Open Danila89 opened 11 months ago

Danila89 commented 11 months ago

Description

In DVC 3.33.3 dvc exp run tends to reformat dvc.yaml file. It was mentioned in Discord that the problem is with default width in ruamel.yaml. Unfortunately besides breaking the lines it tends to merge the lines as well which is pretty inconvenient. The way that dvc.yaml from my practice look like after this formatting is hardly readable (I attached the example below):

image

I think that dvc.yaml should not be altered during dvc exp run.

Reproduce

git clone https://github.com/Danila89/dvc_empty.git && cd dvc_empty && git pull --all && git checkout dsavenkov/dvc_yaml_formatting && dvc exp run -n something After running this command dvc.yaml will have unstaged changes.

Expected

dvc.yaml is unchanged

Environment information

Output of dvc doctor:

(base) danila.savenkov@RS-UNIT-0099 dvc_empty % dvc doctor
DVC version: 3.33.3 (pip)
-------------------------
Platform: Python 3.10.9 on macOS-13.3.1-arm64-arm-64bit
Subprojects:
        dvc_data = 2.22.6
        dvc_objects = 1.4.9
        dvc_render = 1.0.0
        dvc_task = 0.3.0
        scmrepo = 1.5.0
Supports:
        http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2023.5.0, boto3 = 1.26.76)
Config:
        Global: /Users/danila.savenkov/Library/Application Support/dvc
        System: /Library/Application Support/dvc
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: None
Workspace directory: apfs on /dev/disk3s3s1
Repo: dvc, git
Repo.site_cache_dir: /Library/Caches/dvc/repo/64bbbded2e55036b006c56ceaefa98e1
elvijsm2 commented 2 months ago

A similar issue I've struggled with is that something like dvc repro -s some_stage will update the dvc.lock file not only in the part relevant for some_stage, but will reformat it elsewhere as well, generally in a way that leaves trailing white spaces throughout the file. So if the dvc.lock contains something like this

   some_stage:
     cmd: python -m src.process_zip
       long/path/to/data.zip
     deps:
     - path:
          long/path/to/data.zip
        hash: md5
        md5: a5074fdca2d1bf921dd9ea26c61646a3
        size: 13013258
     outs:
     - path: path/to/out.zip
        hash: md5
        md5: a5074fdca2d1bf921dd9ea26c61646a3
        size: 13013258
   other_stage:
      cmd: python -m src.other_stuff
      deps:
      - path: src/other_stuff.py
        hash: md5
        md5: a5074fdca2d1bf921dd9ea26c61646a3
        size: 13013258
      outs:
        # ...

then doing dvc repro -s other_stage will modify the information under some_stage by inserting a space at the end of cmd: python -m src.process_zip and after - path: preceding long/path/to/data.zip

Seems to happen if the line

cmd: python -m src.process_zip long/path/to/data.zip

would exceed 140 chars, and the

- path: long/path/to/data.zip

90 (but not sure).

What I end up doing is running pre-commit run --files dvc.lock || git add dvc.lock to fix it after every update to dvc.lock...