Closed Luux closed 3 months ago
Just to add my two cents, here ...
I actually work on the same team as the OP. We also tried from different machines, which also did not change anything.
However, there is one piece of information that I can add: When I tried to pull that directory in question, I could verify that at least one file from the original directory actually was fetched to my local cache. (I just manually searched in my cache for the corresponding entry via the hash that I knew from the original.) However, it did not appear in my working directory after checkout (or pull). So what we could still do is we could try and compare the cache and see if maybe even all files in question have been transferred to my local cache. That could mean that something does not work on checkout.
As I stated above, when pushing the directory, only runs
is created and contains only ~11mb. No files
folder is present`.
If I add the same data with the same hashes added via dvc add
, the files
folder is created and takes up ~1gb. dvc pull
of the manually added files works, pulling the og stage output folder does not.
So I can indeed confirm that it is not just a checkout bug.
Update: we found the cause. In one of the commits of the current branch, the out
folder was set to cache: false
. So naturally, the files were not uploaded to the cache anymore.
As the old state was indeed cached, this caused weird behaviour of dvc. A warning or similar could be useful in this case instead of dvc just silently behaving this way. Note that this affects all the outs of all my_stage
s, but only my_stage@28
indicated that something was going wrong at all.
As a suggestion, a warning/error when trying to dvc push
or dvc pull
an out
where cache: false
is set, would be helpful.
Closing this for now, please feel free to create a separate issue for the a warning/error when trying to dvc push or dvc pull an out where cache: false is set, would be helpful.
Bug Report
Description
I want to pull the results of a given stage.
dvc pull
claimseverything is up to date
, but the folder is not created on my local machine.Our setup:
run_28/outdir
my_stage@28
(all the others seem to work!)What I've tried so far:
dvc push
repeatedly on our server where the stage ran, it always claims that ~1k files have been pushed over and over again. The latter is also true if I justdvc push
from the root dirrepo.site_cache_dir
- does not workmy_stage
s again and pushed to get a clean state again -> the same folderrun_28
is broken againdvc push run_28/outdir
to an entirely new dvc storage cache - same problemmy_stage@28
entry fromdvc.lock
->dvc commit run_28/outdir
-> push to new cache -> same problemrun_28/outdir
torun_28/outdir_test
and did advc add
, so I get arun_28/outdir_test.dvc
with the very same output hash as the out from `my_stage@28 -->dvc pull run_28/outdir_test
WORKS,dvc pull run_28/outdir
still does not do anything...It's even worse:
dvc pull run_28/outdir
from an older branch works, switching to the new branch,dvc pull run_28/outdir
says "everything is up-to-date" but there should be changes to the filesReproduce
The data affected is customer data, so I cannot provide the files. For the other
my_stage
entries, everything seems to work...Expected
dvc pull
should work as expected or at least show a meaningful error messageEnvironment information
Output of
dvc doctor
: Local machine:Worker machine:
Additional Information (if any):
dvc pull
also does not work for this directory when I'm running it on the worker machine, so it's not a config issue on my local siderun
directory only takes up ~11mb. If I push therun_28/outdir_test.dvc
(see above), the createdfiles
dir takes up ~1gb - it should include the same files, right?dvc push run_28/outdir -v
:dvc pull
is similar: