Open mikolajpabiszczak opened 2 years ago
@casperdcl FYI. Any thoughts on this scenario?
I'm not sure I follow. Is the issue about authentication for dvc
in CI using env vars? That's already supported (vis https://dvc.org/doc/command-reference/remote/modify#available-parameters-per-storage-type) e.g. AWS_ACCESS_KEY_ID
& AWS_SECRET_ACCESS_KEY
.
Or do you mean DVC's deps.*.repo.url
is a private repo that needs a PAT for pull access? In which case I guess DVC could support a REPO_TOKEN
env var for authentication the same way CML does. Plus it would need a CLI API for it - presumably dvc import --token=...
though not sure where it should store said token. Presumably not in dvc.yaml
but in the system config? Would mean treating the repo URL like a data remote URL (i.e. give it a shortname, save creds in user config dirs, etc.)
@casperdcl This one
Or do you mean DVC's
deps.*.repo.url
is a private repo that needs a PAT for pull access? In which case I guess DVC could support aREPO_TOKEN
env var for authentication the same way CML does. Plus it would need a CLI API for it - presumablydvc import --token=...
though not sure where it should store said token. Presumably not indvc.yaml
but in the system config? Would mean treating the repo URL like a data remote URL (i.e. give it a shortname, save creds in user config dirs, etc.)
Although I believe the PAT / App Token should not be stored, since (in case of the App Token) it will be re-generated every time the pipeline is run (e.g., in GitHub action). One idea for a solution could be to have --import-token
that would work with other dvc commands (e.g., dvc repro
), which - when passed - would make sure that anything that was obtained with dvc import
would use the passed token to authenticate when checking out the repo under url
key.
@dtrifiro Any idea how this should work after dulwich upgrades?
@dberenbaum
If you're thinking of support for git credential helpers, one way this could work is the following
git credential-cache
, if cli git is availableFor example:
echo "[credential]\n helper=cache" >> ~/.gitconfig
printf "url=https://github.com\nusername=username\npassword=password\n" | git credential-cache store
dvc import https://github.com//[...]
This looks a bit clunky to me, although this would work starting with the next dvc release (see https://github.com/iterative/scmrepo/pull/138).
An alternative would be setting up credentials sections in the dvc config that can be looked up when performing import
or import-url
, something like:
['credential "https://github.com"']
username = username
password = password
Might be also be worth it to provide facilities to write values to the config, something like
dvc config set credential.https://github.com username username
dvc config set credential.https://github.com password password
Cons with this approach:
man gitcredentials
)--local
config)Hm, in this case where there is an import from a data registry repo, can the token work over SSH, or would we need to convert to HTTP?
A similar report from a user who wants to dvc import
from a private repo inside their CI environment: https://discord.com/channels/485586884165107732/485596304961962003/1057317845744238644.
hey, any update on having a new feature to import from private repository without using git ssh key?
@moisesrc13 The credential helper support mentioned above is now implemented, so you should be able to use that and authenticate to a private repo in the same ways you can using the git cli.
@moisesrc13 The credential helper support mentioned above is now implemented, so you should be able to use that and authenticate to a private repo in the same ways you can using the git cli.
Thanks. Will give it a try.
I haven't seen any proposal of this kind in the issues and - based on my use case - it could solve a number of problems.
Scenario:
Problem:
dvc import
to obtainsome_data
from the Data Registry (call it:github.com/username/DataRegistry
)dvc.lock
asProposition:
dvc import
(or actuallydvc pull
?) checked forDATA_REGISTRY_TOKEN
env variable and updated the url "on the fly" when pulling data from the remote.Disclaimer: I was intending on writing this some months ago, at the time the desired behaviour was not in place. I did a quick look, but did not find any mention of it.
Thanks for your effort and please ask any questions in case you need clarification!