iterative / dvc

🦉 Data Versioning and ML Experiments
https://dvc.org
Apache License 2.0
13.84k stars 1.18k forks source link

`dvc import` compatible with GitHub App Token #8068

Open mikolajpabiszczak opened 2 years ago

mikolajpabiszczak commented 2 years ago

I haven't seen any proposal of this kind in the issues and - based on my use case - it could solve a number of problems.

Scenario:

Problem:

Proposition:

Disclaimer: I was intending on writing this some months ago, at the time the desired behaviour was not in place. I did a quick look, but did not find any mention of it.

Thanks for your effort and please ask any questions in case you need clarification!

dberenbaum commented 2 years ago

@casperdcl FYI. Any thoughts on this scenario?

casperdcl commented 2 years ago

I'm not sure I follow. Is the issue about authentication for dvc in CI using env vars? That's already supported (vis https://dvc.org/doc/command-reference/remote/modify#available-parameters-per-storage-type) e.g. AWS_ACCESS_KEY_ID & AWS_SECRET_ACCESS_KEY.

Or do you mean DVC's deps.*.repo.url is a private repo that needs a PAT for pull access? In which case I guess DVC could support a REPO_TOKEN env var for authentication the same way CML does. Plus it would need a CLI API for it - presumably dvc import --token=... though not sure where it should store said token. Presumably not in dvc.yaml but in the system config? Would mean treating the repo URL like a data remote URL (i.e. give it a shortname, save creds in user config dirs, etc.)

mikolajpabiszczak commented 2 years ago

@casperdcl This one

Or do you mean DVC's deps.*.repo.url is a private repo that needs a PAT for pull access? In which case I guess DVC could support a REPO_TOKEN env var for authentication the same way CML does. Plus it would need a CLI API for it - presumably dvc import --token=... though not sure where it should store said token. Presumably not in dvc.yaml but in the system config? Would mean treating the repo URL like a data remote URL (i.e. give it a shortname, save creds in user config dirs, etc.)

Although I believe the PAT / App Token should not be stored, since (in case of the App Token) it will be re-generated every time the pipeline is run (e.g., in GitHub action). One idea for a solution could be to have --import-token that would work with other dvc commands (e.g., dvc repro), which - when passed - would make sure that anything that was obtained with dvc import would use the passed token to authenticate when checking out the repo under url key.

dberenbaum commented 2 years ago

@dtrifiro Any idea how this should work after dulwich upgrades?

dtrifiro commented 2 years ago

@dberenbaum

If you're thinking of support for git credential helpers, one way this could work is the following

  1. Setup a credential helper (could even be git credential-cache, if cli git is available
  2. Store the credential in the helper
  3. Actually perform the operation.

For example:

echo "[credential]\n    helper=cache" >> ~/.gitconfig 
printf "url=https://github.com\nusername=username\npassword=password\n" | git credential-cache store
dvc import https://github.com//[...]

This looks a bit clunky to me, although this would work starting with the next dvc release (see https://github.com/iterative/scmrepo/pull/138).

An alternative would be setting up credentials sections in the dvc config that can be looked up when performing import or import-url, something like:

['credential "https://github.com"']
username = username
password = password

Might be also be worth it to provide facilities to write values to the config, something like

dvc config set credential.https://github.com username username       
dvc config set credential.https://github.com password password       

Cons with this approach:

dberenbaum commented 2 years ago

Hm, in this case where there is an import from a data registry repo, can the token work over SSH, or would we need to convert to HTTP?

dberenbaum commented 1 year ago

A similar report from a user who wants to dvc import from a private repo inside their CI environment: https://discord.com/channels/485586884165107732/485596304961962003/1057317845744238644.

moisesrc13 commented 1 year ago

hey, any update on having a new feature to import from private repository without using git ssh key?

dberenbaum commented 1 year ago

@moisesrc13 The credential helper support mentioned above is now implemented, so you should be able to use that and authenticate to a private repo in the same ways you can using the git cli.

moisesrc13 commented 1 year ago

@moisesrc13 The credential helper support mentioned above is now implemented, so you should be able to use that and authenticate to a private repo in the same ways you can using the git cli.

Thanks. Will give it a try.