iterative / dvc

🦉 Data Versioning and ML Experiments
Apache License 2.0
13.96k stars 1.19k forks source link

DVC post-checkout hook: complains about unsaved files (which have not changed) #10584

Open JulianoLagana opened 1 month ago

JulianoLagana commented 1 month ago

Bug Report


DVC post-checkout hook complains that it can't remove unsaved files without confirmation, but these files have not changed.

We currently upgraded from dvc 2.58.1 to 3.55.2. After a while with no problems, I noticed that our post-checkout hook sometimes fails, complaining that it can't remove unsaved files without confirmation. At first I believed this was just actually having unsaved files, so I did dvc checkout --force a few times. However, the problem kept coming back every now and then when switching to different branches.

I then started to do some digging. First, I noticed that even though the post-checkout hook was failing due to unsaved files, dvc status showed no changes. Furthermore, the md5 hash for the "unsaved" file in question (which I computed with md5 filename) exactly matched the one in the .dir file in the cache (this file is inside a folder which is an output of one of our stages). Lastly, I also noticed that the md5 of the file does not change after dvc checkout --force, even though I get an Applying changes M ./ printout.

At the moment I don't really know what the problem is, and would appreciate assistance.


I am not able to reproduce this at will. Haven't yet figured out exactly what makes this happen.


DVC post-checkout hook would complete without errors if I don't have any unsaved files. Alternatively, if I do have unsaved files, I would expect dvc status to point them to me, or at least that their MD5 hash would not match the one tracked by dvc (and then match it after something like dvc checkout --force).

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 3.55.2 (pip)
Platform: Python 3.10.15 on macOS-15.0.1-arm64-arm-64bit
    dvc_data = 3.16.5
    dvc_objects = 5.1.0
    dvc_render = 1.0.2
    dvc_task = 0.3.0
    scmrepo = 3.3.7
    http (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
    https (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
    s3 (s3fs = 2024.2.0, boto3 = 1.34.34)
    Global: /Users/juliano/Library/Application Support/dvc
    System: /Library/Application Support/dvc
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s3s1
Caches: local
Remotes: s3
Workspace directory: apfs on /dev/disk3s3s1
Repo: dvc, git
Repo.site_cache_dir: /Library/Caches/dvc/repo/8ac7a2e9eb78ffa8d315cce7b95313f0

Pre-commit configuration:

fail_fast: true

  - repo:
    rev: v4.4.0
      - id: trailing-whitespace
        exclude: '.*dvc\.lock'
      - id: end-of-file-fixer
        exclude: '^(recipes|lib|datasets|zones|ipython_notebooks|statistics_worksheets|explore)/|params\.json$'
      - id: check-yaml
      - id: check-toml
      - id: check-added-large-files
        args: ["--maxkb=3000"]
      - id: debug-statements
        language_version: python3
  - repo:
    rev: 23.1.0
      - id: black
        exclude: '^(recipes|lib|datasets|zones|ipython_notebooks|statistics_worksheets|explore)/|params\.json$'
        language_version: python3
  - repo:
    rev: 5.12.0
      - id: isort
        exclude: '^(recipes|lib|datasets|zones|ipython_notebooks|statistics_worksheets|explore)/|params\.json$'
        name: isort (python)
  - repo:
    rev: 6.0.0
      - id: flake8
        args: ["--max-line-length=225"]
        exclude: '^(recipes|lib|datasets|zones|ipython_notebooks|statistics_worksheets|explore)/|params\.json$|^src/catella/btr/dash/dataiku\.py$|^src/catella/btr/utils/data_utils\.py$'
  - repo:
    rev: v1.3.0 hooks:
      - id: mypy
        additional_dependencies: [types-requests, types-PyYAML]
        exclude: '^(recipes|lib|datasets|zones|ipython_notebooks|statistics_worksheets|explore)/|params\.json$|^src/catella/btr/dash/dataiku\.py$|^src/catella/property_research_agent/main\.py$|^src/catella/property_research_agent/app\.py$'
  - repo: local
      - id: pytest-check
        name: pytest
        entry: pytest tests/
        language: system
        pass_filenames: false
        always_run: true
          - pre-commit
  - repo:
    rev: 3.55.2
      - id: dvc-pre-push
        additional_dependencies: [".[s3]"]
        language_version: python3
          - push
      - always_run: true
        id: dvc-post-checkout
        additional_dependencies: [".[s3]"]
        language_version: python3
          - post-checkout
KansaiUser commented 2 weeks ago

This actually happens too in the dagshub tutorial