iterative / scmrepo

SCM wrapper and fsspec filesystem for Git for use in DVC.
https://dvc.org
Apache License 2.0
21 stars 14 forks source link

support getting revision without cloning a repository #321

Closed skshetry closed 9 months ago

skshetry commented 9 months ago

Similar to how ls-remote works.

$ git ls-remote git@github.com:iterative/scmrepo.git 3.0.0
04ca00133f2cddf4cc10971de28507c53f50ad7c        refs/tags/3.0.0

$ git ls-remote https://github.com/iterative/dvc.git main
44b78b81b99bac06bf4549487656a5d20005f5b6        refs/heads/main

This will only work for references, so it might not work for commit revisions (and, maybe I am missing others). But it could be a fast path for us.

dberenbaum commented 9 months ago

What will it be used for @skshetry? Just wondering if there is some broader context around this.

skshetry commented 9 months ago

What will it be used for @skshetry? Just wondering if there is some broader context around this.

It might be useful for tracking (streaming) dvc datasets. For tracking a dvc dataset, all you need is a revision.

eg:

$ dvc ds add --url git@github.com:iterative/example-get-started.git --rev=main --type dvc --name example-get-started --path data/data.xml

We can record revision of main and keep it as rev_lock in dvc.lock. Later on dvc ds update, we can check what revision main references to, and see if it can be updated.

At the moment, we need to clone a repository, which can be expensive. If we could just ask the remote server, this operation can be much cheaper.

dberenbaum commented 9 months ago

Okay, makes a lot of sense in that case. Thanks for the context!