iterative / dvc

🦉 Data Versioning and ML Experiments
https://dvc.org
Apache License 2.0
13.96k stars 1.19k forks source link

DVC post-checkout hook: complains about unsaved files (which have not changed) #10584

Open JulianoLagana opened 1 month ago

JulianoLagana commented 1 month ago

Bug Report

Description

DVC post-checkout hook complains that it can't remove unsaved files without confirmation, but these files have not changed.

We currently upgraded from dvc 2.58.1 to 3.55.2. After a while with no problems, I noticed that our post-checkout hook sometimes fails, complaining that it can't remove unsaved files without confirmation. At first I believed this was just actually having unsaved files, so I did dvc checkout --force a few times. However, the problem kept coming back every now and then when switching to different branches.

I then started to do some digging. First, I noticed that even though the post-checkout hook was failing due to unsaved files, dvc status showed no changes. Furthermore, the md5 hash for the "unsaved" file in question (which I computed with md5 filename) exactly matched the one in the .dir file in the cache (this file is inside a folder which is an output of one of our stages). Lastly, I also noticed that the md5 of the file does not change after dvc checkout --force, even though I get an Applying changes M ./ printout.

At the moment I don't really know what the problem is, and would appreciate assistance.

Reproduce

I am not able to reproduce this at will. Haven't yet figured out exactly what makes this happen.

Expected

DVC post-checkout hook would complete without errors if I don't have any unsaved files. Alternatively, if I do have unsaved files, I would expect dvc status to point them to me, or at least that their MD5 hash would not match the one tracked by dvc (and then match it after something like dvc checkout --force).

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 3.55.2 (pip)
-------------------------
Platform: Python 3.10.15 on macOS-15.0.1-arm64-arm-64bit
Subprojects:
    dvc_data = 3.16.5
    dvc_objects = 5.1.0
    dvc_render = 1.0.2
    dvc_task = 0.3.0
    scmrepo = 3.3.7
Supports:
    http (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
    https (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
    s3 (s3fs = 2024.2.0, boto3 = 1.34.34)
Config:
    Global: /Users/juliano/Library/Application Support/dvc
    System: /Library/Application Support/dvc
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s3s1
Caches: local
Remotes: s3
Workspace directory: apfs on /dev/disk3s3s1
Repo: dvc, git
Repo.site_cache_dir: /Library/Caches/dvc/repo/8ac7a2e9eb78ffa8d315cce7b95313f0

Pre-commit configuration:

---
fail_fast: true

repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.4.0
    hooks:
      - id: trailing-whitespace
        exclude: '.*dvc\.lock'
      - id: end-of-file-fixer
        exclude: '^(recipes|lib|datasets|zones|ipython_notebooks|statistics_worksheets|explore)/|params\.json$'
      - id: check-yaml
      - id: check-toml
      - id: check-added-large-files
        args: ["--maxkb=3000"]
      - id: debug-statements
        language_version: python3
  - repo: https://github.com/psf/black
    rev: 23.1.0
    hooks:
      - id: black
        exclude: '^(recipes|lib|datasets|zones|ipython_notebooks|statistics_worksheets|explore)/|params\.json$'
        language_version: python3
  - repo: https://github.com/pycqa/isort
    rev: 5.12.0
    hooks:
      - id: isort
        exclude: '^(recipes|lib|datasets|zones|ipython_notebooks|statistics_worksheets|explore)/|params\.json$'
        name: isort (python)
  - repo: https://github.com/pycqa/flake8
    rev: 6.0.0
    hooks:
      - id: flake8
        args: ["--max-line-length=225"]
        exclude: '^(recipes|lib|datasets|zones|ipython_notebooks|statistics_worksheets|explore)/|params\.json$|^src/catella/btr/dash/dataiku\.py$|^src/catella/btr/utils/data_utils\.py$'
  - repo: https://github.com/pre-commit/mirrors-mypy
    rev: v1.3.0 hooks:
      - id: mypy
        additional_dependencies: [types-requests, types-PyYAML]
        exclude: '^(recipes|lib|datasets|zones|ipython_notebooks|statistics_worksheets|explore)/|params\.json$|^src/catella/btr/dash/dataiku\.py$|^src/catella/property_research_agent/main\.py$|^src/catella/property_research_agent/app\.py$'
  - repo: local
    hooks:
      - id: pytest-check
        name: pytest
        entry: pytest tests/
        language: system
        pass_filenames: false
        always_run: true
        stages:
          - pre-commit
  - repo: https://github.com/iterative/dvc
    rev: 3.55.2
    hooks:
      - id: dvc-pre-push
        additional_dependencies: [".[s3]"]
        language_version: python3
        stages:
          - push
      - always_run: true
        id: dvc-post-checkout
        additional_dependencies: [".[s3]"]
        language_version: python3
        stages:
          - post-checkout
KansaiUser commented 2 weeks ago

This actually happens too in the dagshub tutorial