iterative / dvc

🦉 ML Experiments and Data Management with Git
https://dvc.org
Apache License 2.0
13.67k stars 1.18k forks source link

get-url: naming confusion #6494

Open dberenbaum opened 3 years ago

dberenbaum commented 3 years ago

Extracted from https://github.com/iterative/dvc/issues/6485#issuecomment-904983492:

dvc get-url and dvc.api.get_url() don't really do the same thing unfortunately

Having a cli command and api method with the same names but different functionality is probably not ideal ☹️ .

efiop commented 3 years ago

A bit unrelated, but might be time to revisit an old idea of just having get/import:

dvc get some/path  # same as get-url
dvc get some/path  --repo https://github.com/iterative/dataset-registry # same as current get 

this would also simplify dvc get . some/path as it will be just dvc get some/path if you are in the current repo. Same with import/list.

We will also have a dvc fs in fsspec soon, where we'll reserver dvc:// schema for us, so we could use it here too:

dvc get dvc://iterative/dataset-registry/some/path
dberenbaum commented 3 years ago

dvc get dvc://iterative/dataset-registry/some/path

This syntax looks nice. Should github.com be in the path, or am I misunderstanding?

efiop commented 3 years ago

@dberenbaum I thought about that one as default, for best comfort (kinda like namespaces in homebrew). We could definitely do something so use raw git urls.

efiop commented 3 years ago

Btw, one more comment about dvc list is that with this new approach, it could be used as universal list. E.g. like aws s3 ls for s3.

dvc list s3://bucket/path
dvc list dvc://iterative/dataset-registry/path

just talking out of my head. Clearly not the highest priority or anything like that.

mike0sv commented 2 years ago

My 2 cents, in mlem we have mlem get cli command (and mlem.api.get respectively), and it can be used with https://github.com/org/repo/path or https://github.com/org/repo/tree/rev/path directly or with path --repo https://github.com/org/repo --rev rev. Internally it just constructs value from first examples from repo and rev arguments