iterative / dataset-registry

Dataset registry DVC project
67 stars 39 forks source link

Docs mention a file that is missing (in tutorial) #5

Closed MayankGoel28 closed 4 years ago

MayankGoel28 commented 4 years ago

https://dvc.org/doc/tutorials/get-started/add-files mentions

dvc get https://github.com/iterative/dataset-registry \get-started/data.xml -o data/data.xml (which has some extra newlines, and could not be copy pasted correctly)

However, in https://github.com/iterative/dataset-registry/tree/master/get-started , there is no data.xml and thus, when following the tutorial an error message is returned:

ERROR: failed to download 'https://remote.dvc.org/dataset-registry/a3/04afb96060aad90176268345e10355' to 'data/.CE4hKPGKTam3UAWsZ7pUaG/a3/04afb96060aad90176268345e10355' - HTTPSConnectionPool(host='s3-us-east-2.amazonaws.com', port=443): Read timed out.
ERROR: failed to get 'get-started/data.xml' from 'https://github.com/iterative/dataset-registry' - 1 files failed to download

Editing the command to data.xml.dvc works.

shcheklein commented 4 years ago

dvc get https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml this command should be working (no \), so it looks you copy-paste was wrong.

I agree though that the ERROR is extremely obscure:

(.env) [ivan@ivan ~/Projects/test-gs]$ dvc get https://github.com/iterative/dataset-registry \get-started/data.xml -o data/data.xml
ERROR: unexpected error - [Errno 20] Not a directory: '/Users/ivan/Projects/test-gs/data/.2tNqCjk4JGpHdSUT2AQvj8'

I would create a ticket in the iterative/dvc for this.

shcheklein commented 4 years ago

Closing this for now here, since it's not a data registry bug.