Open dberenbaum opened 3 years ago
A bit unrelated, but might be time to revisit an old idea of just having get/import:
dvc get some/path # same as get-url
dvc get some/path --repo https://github.com/iterative/dataset-registry # same as current get
this would also simplify dvc get . some/path
as it will be just dvc get some/path
if you are in the current repo. Same with import/list
.
We will also have a dvc fs in fsspec soon, where we'll reserver dvc://
schema for us, so we could use it here too:
dvc get dvc://iterative/dataset-registry/some/path
dvc get dvc://iterative/dataset-registry/some/path
This syntax looks nice. Should github.com
be in the path, or am I misunderstanding?
@dberenbaum I thought about that one as default, for best comfort (kinda like namespaces in homebrew). We could definitely do something so use raw git urls.
Btw, one more comment about dvc list
is that with this new approach, it could be used as universal list. E.g. like aws s3 ls
for s3.
dvc list s3://bucket/path
dvc list dvc://iterative/dataset-registry/path
just talking out of my head. Clearly not the highest priority or anything like that.
My 2 cents, in mlem we have mlem get
cli command (and mlem.api.get
respectively), and it can be used with https://github.com/org/repo/path
or https://github.com/org/repo/tree/rev/path
directly or with path --repo https://github.com/org/repo --rev rev
. Internally it just constructs value from first examples from repo and rev arguments
Extracted from https://github.com/iterative/dvc/issues/6485#issuecomment-904983492:
Having a cli command and api method with the same names but different functionality is probably not ideal ☹️ .