should create a DVC metadata file with the pointer and hash information for the source data file, and that it should not download the data immediately. That works as expected.
The documentation also states that if I later run
dvc pull data.csv
at that point, it will download the data and place it in my work tree. (I guess it's not clear whether the data will be added to the cache?) This doesn't work; instead
> dvc pull data.csv
Collecting
Fetching
Building workspace index
Comparing indexes
Applying changes
Everything is up to date.
ERROR: failed to pull data from the cloud - Checkout failed for following targets:
data.csv
Is your cache up to date?
<https://error.dvc.org/missing-files>
Reproduce
Expected
Based on the documentation, my expectation is that
should copy data.csv to my local work tree from S3, that data.csv should not be added to the cache, and that any changes to data.csv in S3 should cause local pipelines that use data.csv as a dependency to be flagged as out of date.
This expected behavior is explained in a couple of places in the documentation:
Bug Report
Description
The documentation for
import-url
explains that running this command:should create a DVC metadata file with the pointer and hash information for the source data file, and that it should not download the data immediately. That works as expected.
The documentation also states that if I later run
at that point, it will download the data and place it in my work tree. (I guess it's not clear whether the data will be added to the cache?) This doesn't work; instead
Reproduce
Expected
Based on the documentation, my expectation is that
should copy
data.csv
to my local work tree from S3, thatdata.csv
should not be added to the cache, and that any changes todata.csv
in S3 should cause local pipelines that usedata.csv
as a dependency to be flagged as out of date.This expected behavior is explained in a couple of places in the documentation:
Environment information
Output of
dvc doctor
:Additional Information (if any):