Open Arob113 opened 3 weeks ago
I'm not sure I understand the description, @Arob113 could you please better explain and / or suggest better title for the issue please.
dir with first two digits of md5 and has the cache file has shows up in SOURCE_DIR/.dvc/cache
what does it mean? why is it a problem?
@shcheklein, this means that it will fetch it to the cache, but it doesnt add the file to my repo. I would need to manually move the cache dir and re-pull.
I would need to manually move the cache dir and re-pull.
can you share the exact command please?
dvc pull -r aws-legacy all.csv
dvc pull -r aws-legacy all.csv
@Arob113 what part in this command is manually move the cache dir
? could you please share all the steps / commands / details?
@shcheklein, I run that pull
command and it saves the cache file in .dvc/cache/XX/...
. I then manually copy and paste that cache file and folder into .dvc/cache/files/md5
(resulting in .dvc/cache/diles/md5/XX/...
) and rerun the pull
and the proper csv is checked out
@Arob113 did you use DVC 2 before? E.g. did you push into aws-legacy
with DVC 2? Can you also share the content of the all.csv.dvc
(at least the structure of it)?
For some of the files affected, we had them in dvc 2.xx before but some of the files affected were only ever saved with dvc 3. all.csv
specifically was originally dvc 2. I have tried the local cache migration
fix, but that didnt seem to work either.
all.csv.dvc:
outs:
- hash: md5
md5: 7e643e15408257a8a04befacb2320ecd
path: all.csv
size: 322629732
cloud:
aws-legacy:
etag: caae4444654e35810c55e708d31a7304-7
version_id: rVDuvJLrfq.iIsfiET25izm3901gzwtf
The file structure is already DVC 3.0. The bug you described - is it happening with this file, or the previous version of it (DVC 2). Was cloud versioning enabled for the remote storage before?
Did you run the migration with --dvc-files
?
(I'm still trying to understand the full picture to reproduce this, otherwise it's quite hard to guess / understand what is happening)
Bug Report
Expected
Environment information
DVC version: 3.50.0 (pip)
Platform: Python 3.10.9 on Linux-6.5.0-28-generic-x86_64-with-glibc2.35 Subprojects: dvc_data = 3.15.1 dvc_objects = 5.1.0 dvc_render = 1.0.2 dvc_task = 0.4.0 scmrepo = 3.3.5 Supports: http (aiohttp = 3.9.5, aiohttp-retry = 2.8.3), https (aiohttp = 3.9.5, aiohttp-retry = 2.8.3), s3 (s3fs = 2024.5.0, boto3 = 1.34.106)
Additional Information (if any):