iterative / dvc

🦉 ML Experiments and Data Management with Git
https://dvc.org
Apache License 2.0
13.36k stars 1.16k forks source link

Git credentials does not work on a dvc pull for an imported dvc-git repository #10439

Open marklani opened 1 month ago

marklani commented 1 month ago

Bug Report

Description

We have been using dvc 2.9.5 for awhile and had to upgrade to dvc 3.50.2 due to pip installed dvc 2.9.5 does not work in a python virtual environment.

However, after upgrading to dvc 3.50.2, we found out that dvc pull does not work possibly due to credential errors. Even though credentials are set in git config global credentials as below, it is still asking for the username and password. [credential "https://gitlab.com/repo"] useHttpPath = true helper = "!f() { echo \"username=ACCESS_TOKEN\"; echo \"password=${ACCESS_TOKEN}\"; }; f"

The $ACCESS_TOKEN is set as an environment variable.

If we re-init by dvc importing the repo, it will fail as well due to the same reason

Reproduce

  1. dvc import https://gitlab.com/some-repo.git
  2. It will ask for credentials

Expected

dvc pull should work without asking for credentials

Environment information

Output of dvc doctor:

DVC version: 3.50.2 (pip)
-------------------------
Platform: Python 3.11.9 on Windows-10-10.0.22621-SP0
Subprojects:
        dvc_data = 3.15.1
        dvc_objects = 5.1.0
        dvc_render = 1.0.2
        dvc_task = 0.4.0
        scmrepo = 3.3.5
Supports:
        http (aiohttp = 3.9.5, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.9.5, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2024.5.0, boto3 = 1.34.106)
Config:
        Global: C:\Users\User\AppData\Local\iterative\dvc
        System: C:\ProgramData\iterative\dvc
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: None
Workspace directory: NTFS on C:\
Repo: dvc, git
Repo.site_cache_dir: C:\ProgramData\iterative\dvc\Cache\repo\59881b4dde257e1c98c40e271dd0be55

Output of dvc import with verbose:

2024-05-24 17:36:46,321 DEBUG: v3.50.2 (pip), CPython 3.11.9 on Windows-10-10.0.22621-SP0
2024-05-24 17:36:46,322 DEBUG: command: C:\Projects\other-repo\venv\Scripts\dvc import https://gitlab.com/some-repo.git data -v
2024-05-24 17:36:50,007 DEBUG: Removing output 'data' of stage: 'data.dvc'.
2024-05-24 17:36:50,007 DEBUG: Removing 'C:\Projects\another-repo\data'
Importing 'data (https://gitlab.com/some-repo.git)' -> 'data'
2024-05-24 17:36:50,021 DEBUG: Computed stage: 'data.dvc' md5: '00be441c1e13be5d48543357a93ba182'
2024-05-24 17:36:50,021 DEBUG: 'md5' of stage: 'data.dvc' changed.
2024-05-24 17:36:50,023 DEBUG: Creating external repo https://gitlab.com/some-repo.git@None
2024-05-24 17:36:50,023 DEBUG: erepo: git clone 'https://gitlab.com/some-repo.git' to a temporary dir
Cloning data-dvc.git|                                                                  |0.00/? [00:00,      ?obj/s]Username for 'https://gitlab.com':
Password for 'https://gitlab.com':
2024-05-24 17:36:53,936 ERROR: failed to import 'data' - SCM error: Failed to clone repo 'https://gitlab.com/some-repo.git' to 'C:\Users\User\AppData\Local\Temp\tmpvf2blglvdvc-clone': No valid credentials provided
Traceback (most recent call last):
  File "C:\Projects\other\venv\Lib\site-packages\scmrepo\git\backend\dulwich\client.py", line 50, in _http_request
    result = super()._http_request(
             ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\dulwich\client.py", line 2303, in _http_request
    raise HTTPUnauthorized(resp.headers.get("WWW-Authenticate"), url)
dulwich.client.HTTPUnauthorized: No valid credentials provided

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Projects\other\venv\Lib\site-packages\scmrepo\git\backend\dulwich\__init__.py", line 260, in clone
    repo = clone_from()
           ^^^^^^^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\dulwich\porcelain.py", line 546, in clone
    return client.clone(
           ^^^^^^^^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\dulwich\client.py", line 752, in clone
    result = self.fetch(path, target, progress=progress, depth=depth)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\dulwich\client.py", line 840, in fetch
    result = self.fetch_pack(
             ^^^^^^^^^^^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\dulwich\client.py", line 2157, in fetch_pack
    refs, server_capabilities, url = self._discover_references(
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\dulwich\client.py", line 2013, in _discover_references
    resp, read = self._http_request(url, headers)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\scmrepo\git\backend\dulwich\client.py", line 61, in _http_request
    result = super()._http_request(
             ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\dulwich\client.py", line 2303, in _http_request
    raise HTTPUnauthorized(resp.headers.get("WWW-Authenticate"), url)
dulwich.client.HTTPUnauthorized: No valid credentials provided

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Projects\other\venv\Lib\site-packages\dvc\scm.py", line 150, in clone
    git = Git.clone(url, to_path, progress=pbar.update_git, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\scmrepo\git\__init__.py", line 154, in clone
    backend.clone(url, to_path, bare=bare, mirror=mirror, **kwargs)
  File "C:\Projects\other\venv\Lib\site-packages\scmrepo\git\backend\dulwich\__init__.py", line 268, in clone
    raise CloneError(url, os.fsdecode(to_path)) from exc
scmrepo.exceptions.CloneError: Failed to clone repo 'https://gitlab.com/some-repodvc.git' to 'C:\Users\User\AppData\Local\Temp\tmpvf2blglvdvc-clone'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Projects\other\venv\Lib\site-packages\dvc\commands\imp.py", line 15, in run
    self.repo.imp(
  File "C:\Projects\other\venv\Lib\site-packages\dvc\repo\imp.py", line 44, in imp
    return self.imp_url(path, out=out, erepo=erepo, frozen=True, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\dvc\repo\__init__.py", line 58, in wrapper
    return f(repo, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\dvc\repo\scm_context.py", line 143, in run
    return method(repo, *args, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\dvc\repo\imp_url.py", line 86, in imp_url
    stage.run(jobs=jobs, no_download=no_download)
  File "C:\Projects\other\venv\Lib\site-packages\funcy\decorators.py", line 47, in wrapper
    return deco(call, *dargs, **dkwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\dvc\stage\decorators.py", line 44, in rwlocked
    return call()
           ^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\funcy\decorators.py", line 68, in __call__
    return self._func(*self._args, **self._kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\dvc\stage\__init__.py", line 603, in run
    self._sync_import(dry, force, kwargs.get("jobs", None), no_download)
  File "C:\Projects\other\venv\Lib\site-packages\funcy\decorators.py", line 47, in wrapper
    return deco(call, *dargs, **dkwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\dvc\stage\decorators.py", line 44, in rwlocked
    return call()
           ^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\funcy\decorators.py", line 68, in __call__
    return self._func(*self._args, **self._kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\dvc\stage\__init__.py", line 640, in _sync_import
    sync_import(self, dry, force, jobs, no_download)
  File "C:\Projects\other\venv\Lib\site-packages\dvc\stage\imports.py", line 56, in sync_import
    stage.save_deps()
  File "C:\Projects\other\venv\Lib\site-packages\dvc\stage\__init__.py", line 496, in save_deps
    dep.save()
  File "C:\Projects\other\venv\Lib\site-packages\dvc\dependency\repo.py", line 63, in save
    rev = self.fs.repo.get_rev()
          ^^^^^^^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\dvc\fs\dvc.py", line 565, in repo
    return self.fs.repo
           ^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\functools.py", line 1001, in __get__
    val = self.func(instance)
          ^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\dvc\fs\dvc.py", line 198, in repo
    repo = self._make_repo(**self._repo_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\dvc\fs\dvc.py", line 275, in _make_repo
    with Repo.open(uninitialized=True, **kwargs) as repo:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\dvc\repo\__init__.py", line 297, in open
    return open_repo(url, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\dvc\repo\open_repo.py", line 60, in open_repo
    return _external_repo(url, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\dvc\repo\open_repo.py", line 23, in _external_repo
    path = _cached_clone(url, rev)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\dvc\repo\open_repo.py", line 134, in _cached_clone
    clone_path, shallow = _clone_default_branch(url, rev)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\funcy\decorators.py", line 47, in wrapper
    return deco(call, *dargs, **dkwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\funcy\flow.py", line 246, in wrap_with
    return call()
           ^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\funcy\decorators.py", line 68, in __call__
    return self._func(*self._args, **self._kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\dvc\repo\open_repo.py", line 198, in _clone_default_branch
    git = clone(url, clone_path)
          ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\other\venv\Lib\site-packages\dvc\scm.py", line 155, in clone
    raise CloneError("SCM error") from exc
dvc.scm.CloneError: SCM error

2024-05-24 17:36:53,963 DEBUG: Analytics is enabled.
2024-05-24 17:36:54,096 DEBUG: Trying to spawn ['daemon', 'analytics', 'C:\\Users\\User\\AppData\\Local\\Temp\\tmpvxkz8njy', '-v']
2024-05-24 17:36:54,101 DEBUG: Spawned ['daemon', 'analytics', 'C:\\Users\\User\\AppData\\Local\\Temp\\tmpvxkz8njy', '-v'] with pid 29688

Additional Information (if any):

shcheklein commented 1 month ago

Most likely this is due to dulwich (one of Git backends that we are using to clone the repo) not supporting credential helpers - e.g. https://github.com/jelmer/dulwich/issues/873

@skshetry do you remember by chance why did we decide to do clones with dulwich?