iterative / dvc

🦉 ML Experiments and Data Management with Git
https://dvc.org
Apache License 2.0
13.36k stars 1.16k forks source link

`dvc data status` key error #10402

Closed mattangus closed 2 months ago

mattangus commented 2 months ago

Bug Report

Running dvc data status gives me an error on one machine but not on another:

Error ``` dvc data status -v 10:23:23 2024-04-24 10:23:26,182 DEBUG: v3.50.0 (pip), CPython 3.12.2 on Linux-6.5.0-28-generic-x86_64-with-glibc2.35 2024-04-24 10:23:26,182 DEBUG: command: /home/matt/workspace/virtual_environments/py-3.12/bin/dvc data status -v 2024-04-24 10:23:26,494 ERROR: unexpected error - b'2c6373811567f2b2023f065fb5a333fdeefd54bb' Traceback (most recent call last): File "/home/matt/workspace/virtual_environments/py-3.12/lib/python3.12/site-packages/dvc/cli/__init__.py", line 211, in main ret = cmd.do_run() ^^^^^^^^^^^^ File "/home/matt/workspace/virtual_environments/py-3.12/lib/python3.12/site-packages/dvc/cli/command.py", line 27, in do_run return self.run() ^^^^^^^^^^ File "/home/matt/workspace/virtual_environments/py-3.12/lib/python3.12/site-packages/dvc/commands/data.py", line 110, in run status = self.repo.data_status( ^^^^^^^^^^^^^^^^^^^^^^ File "/home/matt/workspace/virtual_environments/py-3.12/lib/python3.12/site-packages/dvc/repo/data.py", line 234, in status git_info = _git_info(repo.scm, untracked_files=untracked_files) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/matt/workspace/virtual_environments/py-3.12/lib/python3.12/site-packages/dvc/repo/data.py", line 141, in _git_info staged, unstaged, untracked = scm.status(untracked_files=untracked_files) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/matt/workspace/virtual_environments/py-3.12/lib/python3.12/site-packages/scmrepo/git/__init__.py", line 307, in _backend_func result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/matt/workspace/virtual_environments/py-3.12/lib/python3.12/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 880, in status staged, unstaged, untracked = git_status( ^^^^^^^^^^^ File "/home/matt/workspace/virtual_environments/py-3.12/lib/python3.12/site-packages/dulwich/porcelain.py", line 1318, in status tracked_changes = get_tree_changes(r) ^^^^^^^^^^^^^^^^^^^ File "/home/matt/workspace/virtual_environments/py-3.12/lib/python3.12/site-packages/dulwich/porcelain.py", line 1456, in get_tree_changes for change in index.changes_from_tree(r.object_store, tree_id): File "/home/matt/workspace/virtual_environments/py-3.12/lib/python3.12/site-packages/dulwich/index.py", line 553, in changes_from_tree yield from changes_from_tree( File "/home/matt/workspace/virtual_environments/py-3.12/lib/python3.12/site-packages/dulwich/index.py", line 657, in changes_from_tree for name, mode, sha in iter_tree_contents(object_store, tree): File "/home/matt/workspace/virtual_environments/py-3.12/lib/python3.12/site-packages/dulwich/object_store.py", line 1745, in iter_tree_contents tree = store[entry.sha] ~~~~~^^^^^^^^^^^ File "/home/matt/workspace/virtual_environments/py-3.12/lib/python3.12/site-packages/dulwich/object_store.py", line 154, in __getitem__ type_num, uncomp = self.get_raw(sha1) ^^^^^^^^^^^^^^^^^^ File "/home/matt/workspace/virtual_environments/py-3.12/lib/python3.12/site-packages/dulwich/object_store.py", line 601, in get_raw raise KeyError(hexsha) KeyError: b'2c6373811567f2b2023f065fb5a333fdeefd54bb' 2024-04-24 10:23:26,517 DEBUG: link type reflink is not available ([Errno 95] no more link types left to try out) 2024-04-24 10:23:26,517 DEBUG: Removing '/home/matt/workspace/HA/OD-Stuff/.pQ5HNP36nqZbOfuABjeJDA.tmp' 2024-04-24 10:23:26,517 DEBUG: Removing '/home/matt/workspace/HA/OD-Stuff/.pQ5HNP36nqZbOfuABjeJDA.tmp' 2024-04-24 10:23:26,517 DEBUG: Removing '/home/matt/workspace/HA/OD-Stuff/.pQ5HNP36nqZbOfuABjeJDA.tmp' 2024-04-24 10:23:26,517 DEBUG: Removing '/home/matt/workspace/HA/OD-Stuff/ha_gym/.dvc/.cache/files/md5/.cOGbTVUVskCrA6vs0mD64Q.tmp' 2024-04-24 10:23:26,525 DEBUG: Version info for developers: DVC version: 3.50.0 (pip) ------------------------- Platform: Python 3.12.2 on Linux-6.5.0-28-generic-x86_64-with-glibc2.35 Subprojects: dvc_data = 3.15.1 dvc_objects = 5.0.0 dvc_render = 1.0.1 dvc_task = 0.3.0 scmrepo = 3.1.0 Supports: http (aiohttp = 3.9.3, aiohttp-retry = 2.8.3), https (aiohttp = 3.9.3, aiohttp-retry = 2.8.3), s3 (s3fs = 2024.2.0, boto3 = 1.34.34) Config: Global: /home/matt/.config/dvc System: /etc/xdg/xdg-ubuntu/dvc Cache types: hardlink, symlink Cache directory: ext4 on /dev/nvme1n1p3 Caches: local Remotes: s3 Workspace directory: ext4 on /dev/nvme1n1p3 Repo: dvc, git Repo.site_cache_dir: /var/tmp/dvc/repo/b11a8fb5114eb46d6400fbaefadf5890 Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help! 2024-04-24 10:23:26,527 DEBUG: Analytics is enabled. 2024-04-24 10:23:26,550 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmp3i_1azl0', '-v'] 2024-04-24 10:23:26,557 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmp3i_1azl0', '-v'] with pid 141995 ```

Description

This seems to be related to the untracked changes I have in my working directory. However, the other machine that this command works on also has many untracked changes too. dvc status still works.

Reproduce

I'm not sure how to reproduce this issue.

Expected

On the other machine the same command outputs No changes..

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 3.50.0 (pip)
-------------------------
Platform: Python 3.12.2 on Linux-6.5.0-28-generic-x86_64-with-glibc2.35
Subprojects:
    dvc_data = 3.15.1
    dvc_objects = 5.0.0
    dvc_render = 1.0.1
    dvc_task = 0.3.0
    scmrepo = 3.1.0
Supports:
    http (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
    https (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
    s3 (s3fs = 2024.2.0, boto3 = 1.34.34)
Config:
    Global: /home/matt/.config/dvc
    System: /etc/xdg/xdg-ubuntu/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme1n1p3
Caches: local
Remotes: s3
Workspace directory: ext4 on /dev/nvme1n1p3
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/b11a8fb5114eb46d6400fbaefadf5890

Additional Information (if any):

dberenbaum commented 2 months ago

Thanks for the report. Unfortunately, since it's not reproducible, and the error comes not from dvc but from dulwich, I am going to close this one since it does not look like there's much we can do.