Closed luhuiguo closed 1 month ago
@luhuiguo could you please run it with -v
and share the full stack trace. Thanks.
is it the same error if you run it w/o setting up the proxy in Git?
@luhuiguo could you please run it with
-v
and share the full stack trace. Thanks.is it the same error if you run it w/o setting up the proxy in Git?
$ git config unset --global http.proxy
$ git config unset --global https.proxy
$ git config list
filter.lfs.clean=git-lfs clean -- %f
filter.lfs.smudge=git-lfs smudge -- %f
filter.lfs.process=git-lfs filter-process
filter.lfs.required=true
$ git clone <GIT_REPOSITORY_URL>
Cloning into '<DIRECTORY>'...
fatal: unable to access '<GIT_REPOSITORY_URL>': Could not resolve host: <GITLAB_HOST>
$ dvc get -v <GIT_REPOSITORY_URL> <PATH>
2024-09-23 10:45:53,687 DEBUG: v3.55.2 (deb), CPython 3.10.8 on Linux-3.10.0-1160.el7.x86_64-x86_64-with-glibc2.39
2024-09-23 10:45:53,687 DEBUG: command: get -v <GIT_REPOSITORY_URL> <PATH>
2024-09-23 10:45:53,946 DEBUG: Creating external repo <GIT_REPOSITORY_URL>@None
2024-09-23 10:45:53,946 DEBUG: erepo: git clone '<GIT_REPOSITORY_URL>' to a temporary dir
2024-09-23 10:46:18,913 ERROR: failed to get '<PATH>' - SCM error: Failed to clone repo '<GIT_REPOSITORY_URL>' to '/tmp/tmpibarss8odvc-clone': HTTPSConnectionPool(host='<GITLAB_HOST>', port=443): Max retries exceeded with url: /<REPOSITORY>/info/refs?service=git-upload-pack (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f0967118070>: Failed to resolve '<GITLAB_HOST>' ([Errno -2] Name or service not known)")): HTTPSConnectionPool(host='<GITLAB_HOST>', port=443): Max retries exceeded with url: /<REPOSITORY>/info/refs?service=git-upload-pack (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f0967118070>: Failed to resolve '<GITLAB_HOST>' ([Errno -2] Name or service not known)")): <urllib3.connection.HTTPSConnection object at 0x7f0967118070>: Failed to resolve '<GITLAB_HOST>' ([Errno -2] Name or service not known): [Errno -2] Name or service not known
Traceback (most recent call last):
File "urllib3/connection.py", line 196, in _new_conn
File "urllib3/util/connection.py", line 60, in create_connection
File "socket.py", line 955, in getaddrinfo
socket.gaierror: [Errno -2] Name or service not known
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "urllib3/connectionpool.py", line 789, in urlopen
File "urllib3/connectionpool.py", line 490, in _make_request
File "urllib3/connectionpool.py", line 466, in _make_request
File "urllib3/connectionpool.py", line 1095, in _validate_conn
File "urllib3/connection.py", line 615, in connect
File "urllib3/connection.py", line 203, in _new_conn
urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPSConnection object at 0x7f0967118070>: Failed to resolve '<GITLAB_HOST>' ([Errno -2] Name or service not known)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "dulwich/client.py", line 2290, in _http_request
File "urllib3/_request_methods.py", line 136, in request
File "urllib3/_request_methods.py", line 183, in request_encode_url
File "urllib3/poolmanager.py", line 443, in urlopen
File "urllib3/connectionpool.py", line 873, in urlopen
File "urllib3/connectionpool.py", line 873, in urlopen
File "urllib3/connectionpool.py", line 873, in urlopen
File "urllib3/connectionpool.py", line 843, in urlopen
File "urllib3/util/retry.py", line 519, in increment
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='<GITLAB_HOST>', port=443): Max retries exceeded with url: /<REPOSITORY>/info/refs?service=git-upload-pack (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f0967118070>: Failed to resolve '<GITLAB_HOST>' ([Errno -2] Name or service not known)"))
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "scmrepo/git/backend/dulwich/__init__.py", line 260, in clone
File "dulwich/porcelain.py", line 546, in clone
File "dulwich/client.py", line 752, in clone
File "dulwich/client.py", line 840, in fetch
File "dulwich/client.py", line 2157, in fetch_pack
File "dulwich/client.py", line 2013, in _discover_references
File "scmrepo/git/backend/dulwich/client.py", line 50, in _http_request
File "dulwich/client.py", line 2298, in _http_request
dulwich.errors.GitProtocolError: HTTPSConnectionPool(host='<GITLAB_HOST>', port=443): Max retries exceeded with url: /<REPOSITORY>/info/refs?service=git-upload-pack (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f0967118070>: Failed to resolve '<GITLAB_HOST>' ([Errno -2] Name or service not known)"))
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "dvc/scm.py", line 150, in clone
File "scmrepo/git/__init__.py", line 154, in clone
File "scmrepo/git/backend/dulwich/__init__.py", line 268, in clone
scmrepo.exceptions.CloneError: Failed to clone repo '<GIT_REPOSITORY_URL>' to '/tmp/tmpibarss8odvc-clone'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "dvc/commands/get.py", line 37, in _get_file_from_repo
File "dvc/repo/get.py", line 45, in get
File "dvc/repo/__init__.py", line 302, in open
File "dvc/repo/open_repo.py", line 60, in open_repo
File "contextlib.py", line 79, in inner
File "dvc/repo/open_repo.py", line 23, in _external_repo
File "dvc/repo/open_repo.py", line 134, in _cached_clone
File "funcy/decorators.py", line 47, in wrapper
File "funcy/flow.py", line 246, in wrap_with
File "funcy/decorators.py", line 68, in __call__
File "dvc/repo/open_repo.py", line 198, in _clone_default_branch
File "dvc/scm.py", line 155, in clone
dvc.scm.CloneError: SCM error
2024-09-23 10:46:18,950 DEBUG: Analytics is enabled.
2024-09-23 10:46:18,952 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmpggwvzttp', '-v']
2024-09-23 10:46:18,962 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmpggwvzttp', '-v'] with pid 201
$ git config --global http.proxy http://10.3.12.8:3128
$ git config --global https.proxy http://10.3.12.8:3128
$ git config list
filter.lfs.clean=git-lfs clean -- %f
filter.lfs.smudge=git-lfs smudge -- %f
filter.lfs.process=git-lfs filter-process
filter.lfs.required=true
http.proxy=http://10.3.12.8:3128
https.proxy=http://10.3.12.8:3128
$ git clone <GIT_REPOSITORY_URL>
Cloning into '<DIRECTORY>'...
remote: Enumerating objects: 166, done.
remote: Counting objects: 100% (133/133), done.
remote: Compressing objects: 100% (119/119), done.
remote: Total 166 (delta 42), reused 0 (delta 0), pack-reused 33
Receiving objects: 100% (166/166), 11.08 MiB | 874.00 KiB/s, done.
Resolving deltas: 100% (48/48), done.
$ dvc get -v <GIT_REPOSITORY_URL> <PATH>
2024-09-23 10:40:27,336 DEBUG: v3.55.2 (deb), CPython 3.10.8 on Linux-3.10.0-1160.el7.x86_64-x86_64-with-glibc2.39
2024-09-23 10:40:27,336 DEBUG: command: get -v <GIT_REPOSITORY_URL> <PATH>
2024-09-23 10:40:27,486 DEBUG: Creating external repo <GIT_REPOSITORY_URL>@None
2024-09-23 10:40:27,486 DEBUG: erepo: git clone '<GIT_REPOSITORY_URL>' to a temporary dir
2024-09-23 10:41:06,026 ERROR: unexpected error - HTTPSConnectionPool(host='<GITLAB_HOST>', port=443): Max retries exceeded with url: /<REPOSITORY>/info/refs?service=git-upload-pack (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7ff140e4c040>: Failed to resolve '<GITLAB_HOST>' ([Errno -2] Name or service not known)")): HTTPSConnectionPool(host='<GITLAB_HOST>', port=443): Max retries exceeded with url: /<REPOSITORY>/info/refs?service=git-upload-pack (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7ff140e4c040>: Failed to resolve '<GITLAB_HOST>' ([Errno -2] Name or service not known)")): <urllib3.connection.HTTPSConnection object at 0x7ff140e4c040>: Failed to resolve '<GITLAB_HOST>' ([Errno -2] Name or service not known): [Errno -2] Name or service not known
Traceback (most recent call last):
File "urllib3/connection.py", line 196, in _new_conn
File "urllib3/util/connection.py", line 60, in create_connection
File "socket.py", line 955, in getaddrinfo
socket.gaierror: [Errno -2] Name or service not known
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "urllib3/connectionpool.py", line 789, in urlopen
File "urllib3/connectionpool.py", line 490, in _make_request
File "urllib3/connectionpool.py", line 466, in _make_request
File "urllib3/connectionpool.py", line 1095, in _validate_conn
File "urllib3/connection.py", line 615, in connect
File "urllib3/connection.py", line 203, in _new_conn
urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPSConnection object at 0x7ff140e4c040>: Failed to resolve '<GITLAB_HOST>' ([Errno -2] Name or service not known)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "dulwich/client.py", line 2290, in _http_request
File "urllib3/_request_methods.py", line 136, in request
File "urllib3/_request_methods.py", line 183, in request_encode_url
File "urllib3/poolmanager.py", line 443, in urlopen
File "urllib3/connectionpool.py", line 873, in urlopen
File "urllib3/connectionpool.py", line 873, in urlopen
File "urllib3/connectionpool.py", line 873, in urlopen
File "urllib3/connectionpool.py", line 843, in urlopen
File "urllib3/util/retry.py", line 519, in increment
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='<GITLAB_HOST>', port=443): Max retries exceeded with url: /<REPOSITORY>/info/refs?service=git-upload-pack (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7ff140e4c040>: Failed to resolve '<GITLAB_HOST>' ([Errno -2] Name or service not known)"))
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "dvc/cli/__init__.py", line 211, in main
File "dvc/cli/command.py", line 41, in do_run
File "dvc/commands/get.py", line 30, in run
File "dvc/commands/get.py", line 37, in _get_file_from_repo
File "dvc/repo/get.py", line 45, in get
File "dvc/repo/__init__.py", line 302, in open
File "dvc/repo/open_repo.py", line 60, in open_repo
File "contextlib.py", line 79, in inner
File "dvc/repo/open_repo.py", line 23, in _external_repo
File "dvc/repo/open_repo.py", line 134, in _cached_clone
File "funcy/decorators.py", line 47, in wrapper
File "funcy/flow.py", line 246, in wrap_with
File "funcy/decorators.py", line 68, in __call__
File "dvc/repo/open_repo.py", line 198, in _clone_default_branch
File "dvc/scm.py", line 152, in clone
File "dvc/repo/experiments/utils.py", line 275, in fetch_all_exps
File "dvc/repo/experiments/utils.py", line 275, in <listcomp>
File "dvc/repo/experiments/utils.py", line 119, in iter_remote_refs
File "scmrepo/git/backend/dulwich/__init__.py", line 590, in iter_remote_refs
File "dulwich/client.py", line 2208, in get_refs
File "dulwich/client.py", line 2013, in _discover_references
File "scmrepo/git/backend/dulwich/client.py", line 50, in _http_request
File "dulwich/client.py", line 2298, in _http_request
dulwich.errors.GitProtocolError: HTTPSConnectionPool(host='<GITLAB_HOST>', port=443): Max retries exceeded with url: /<REPOSITORY>/info/refs?service=git-upload-pack (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7ff140e4c040>: Failed to resolve '<GITLAB_HOST>' ([Errno -2] Name or service not known)"))
2024-09-23 10:41:06,128 DEBUG: Version info for developers:
DVC version: 3.55.2 (deb)
-------------------------
Platform: Python 3.10.8 on Linux-3.10.0-1160.el7.x86_64-x86_64-with-glibc2.39
Subprojects:
Supports:
azure (adlfs = 2024.7.0, knack = 0.12.0, azure-identity = 1.17.1),
gdrive (pydrive2 = 1.20.0),
gs (gcsfs = 2024.6.1),
hdfs (fsspec = 2024.6.1, pyarrow = 17.0.0),
http (aiohttp = 3.10.5, aiohttp-retry = 2.8.3),
https (aiohttp = 3.10.5, aiohttp-retry = 2.8.3),
oss (ossfs = 2023.12.0),
s3 (s3fs = 2024.6.1, boto3 = 1.35.7),
ssh (sshfs = 2024.6.0),
webdav (webdav4 = 0.10.0),
webdavs (webdav4 = 0.10.0),
webhdfs (fsspec = 2024.6.1)
Config:
Global: /home/luhg/.config/dvc
System: /etc/xdg/dvc
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2024-09-23 10:41:06,145 DEBUG: Analytics is enabled.
2024-09-23 10:41:06,147 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmp8y57xigq', '-v']
2024-09-23 10:41:06,155 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmp8y57xigq', '-v'] with pid 174
Okay, probably it should be fixed on the https://github.com/jelmer/dulwich side.
Is there a way for you to run with HTTP_PROXY and HTTPS_PROXY env vars set? I think dulwich supports those.
Probably you can create an alias for now for dvc get
to include those env vars. Would that work for you for now?
I've created an issue upstream https://github.com/jelmer/dulwich/issues/1368
Okay, seems it (the proxy via global Git config) should be supported. I've tried to do this:
(.venv) √ Projects/test-dvc-get % dvc import https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
ERROR: failed to import 'get-started/data.xml' from 'https://github.com/iterative/dataset-registry'. - stage working dir '/Users/ivan/Projects/test-dvc-get/data' does not exist
(.venv) ?1 Projects/test-dvc-get % mkdir data
(.venv) √ Projects/test-dvc-get % dvc import https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
Importing 'get-started/data.xml (https://github.com/iterative/dataset-registry)' -> 'data/data.xml'
ERROR: failed to import 'get-started/data.xml' - SCM error: Failed to clone repo 'https://github.com/iterative/dataset-registry' to '/var/folders/8f/fbysfztx1mb953_gpwl477p80000gn/T/tmphwwt1qxrdvc-clone': HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /iterative/dataset-registry/info/refs?service=git-upload-pack (Caused by ProxyError('Unable to connect to proxy', NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x105ce6330>: Failed to establish a new connection: [Errno 51] Network is unreachable'))): HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /iterative/dataset-registry/info/refs?service=git-upload-pack (Caused by ProxyError('Unable to connect to proxy', NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x105ce6330>: Failed to establish a new connection: [Errno 51] Network is unreachable'))): ('Unable to connect to proxy', NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x105ce6330>: Failed to establish a new connection: [Errno 51] Network is unreachable')): <urllib3.connection.HTTPSConnection object at 0x105ce6330>: Failed to establish a new connection: [Errno 51] Network is unreachable: [Errno 51] Network is unreachable
after running:
$ git config --global http.proxy http://10.3.12.8:3128
$ git config --global https.proxy http://10.3.12.8:3128
So, it's trying to connect to proxy (and fails).
We need a simpler way to reproduce this to research - e.g. some way to run a local proxy to do some experiments.
my error message:
[Errno -2] Name or service not known
and your message:
[Errno 51] Network is unreachable
We deploy a self-managed GitLab instance in the company intranet and use the company's intranet domain name resolution.
The gitlab hostname is not resolvable outside our company intranet.
on my PC
ping <GITLAB_HOSTNAME>
ping: <GITLAB_HOSTNAME>: Name or service not known
On proxy server:
ping <GITLAB_HOSTNAME>
PING <GITLAB_HOSTNAME> (192.168.57.131) 56(84) bytes of data.
64 bytes from<GITLAB_HOSTNAME> (192.168.57.131): icmp_seq=1 ttl=59 time=3.08 ms
Change the domain name in the git url to the IP address
dvc get <GITLAB_URL_WITH_IPADDRESS> <PATH>
ERROR: failed to get '<PATH>' - SCM error: Failed to clone repo '<GITLAB_URL_WITH_IP>' to '/tmp/tmp3u5l98ondvc-clone': HTTPSConnectionPool(host='192.168.57.131', port=443): Max retries exceeded with url: /<REPOSITORY>/info/refs?service=git-upload-pack (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: IP address mismatch, certificate is not valid for '192.168.57.131'. (_ssl.c:997)"))): HTTPSConnectionPool(host='192.168.57.131', port=443): Max retries exceeded with url: /<REPOSITORY>/info/refs?service=git-upload-pack (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: IP address mismatch, certificate is not valid for '192.168.57.131'. (_ssl.c:997)"))): [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: IP address mismatch, certificate is not valid for '192.168.57.131'. (_ssl.c:997)
It seems that I can connect to the gitlab server, but the IP address of the certificate does not match.
Yes, it seems so, but it's hard to tell why is it trying to resolve it on the machine outside proxy.
You can try to add hostname to the /etc/hostname
as a workaround?
Otherwise we need a simple setup (some local) proxy to reproduce this.
$ docker run -it --rm -v ${PWD}:/workspace <DVC_IMAGE> bash
Dockerfile
FROM ubuntu:24.04
RUN apt update && apt install -y gpg curl wget software-properties-common iputils-ping
RUN add-apt-repository -y ppa:git-core/ppa && apt update && apt install -y git
RUN git config --global user.email "<USER_EMAIL>" && git config --global user.name "<USER_NAME>"
RUN curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | bash
RUN apt update && apt install -y git-lfs && git lfs install
RUN wget https://dvc.org/deb/dvc.list -O /etc/apt/sources.list.d/dvc.list && \
wget -qO - https://dvc.org/deb/iterative.asc | gpg --dearmor > packages.iterative.gpg && \
install -o root -g root -m 644 packages.iterative.gpg /etc/apt/trusted.gpg.d/ && \
rm -f packages.iterative.gpg
RUN apt update && apt install -y dvc
RUN mkdir -p /workspace
WORKDIR /workspace
Everything works fine on other computers, But on this server, Due to some network configuration reasons, the server can not access the gitlab server.
$ ping <GITLAB_HOSTNAME>
ping: <GITLAB_HOSTNAME>: Name or service not known
$ ping 192.168.57.131
PING 192.168.57.131 (192.168.57.131) 56(84) bytes of data.
--- 192.168.57.131 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
$ curl <GITLAB_REPOSITORY_URL>
curl: (6) Could not resolve host: <GITLAB_HOSTNAME>
$ git clone <GIT_REPOSITORY_URL>
Cloning into '<DIRECTORY>'...
fatal: unable to access '<GIT_REPOSITORY_URL>': Could not resolve host: <GITLAB_HOSTNAME>
$ git config --global http.proxy http://10.3.12.8:3128
$ git config --global https.proxy http://10.3.12.8:3128
git clone <GITLAB_REPOSITORY_URL
Cloning into '<REPOSITORY>'...
remote: Enumerating objects: 166, done.
remote: Counting objects: 100% (133/133), done.
remote: Compressing objects: 100% (119/119), done.
remote: Total 166 (delta 42), reused 0 (delta 0), pack-reused 33
Receiving objects: 100% (166/166), 11.08 MiB | 871.00 KiB/s, done.
Resolving deltas: 100% (48/48), done.
$ curl <GITLAB_REPOSITORY_URL>
curl: (6) Could not resolve host: <GITLAB_HOSTNAME>
$ export HTTP_PROXY=http://10.3.12.8:3128
$ export HTTPS_PROXY=http://10.3.12.8:3128
$ curl <GITLAB_REPOSITORY_URL>
<html><body>You are being <a href="https://<GITLAB_HOSTNAME>/users/sign_in">redirected</a>.</body></html>
$ dvc get <GITLAB_REPOSITORY_URL> <PATH>
ERROR: unexpected error - HTTPSConnectionPool(host='<GITLAB_HOSTNAME>', port=443): Max retries exceeded with url: /JYAI/data-registry/info/refs?service=git-upload-pack (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f85816334c0>: Failed to resolve '<GITLAB_HOSTNAME>' ([Errno -2] Name or service not known)")): HTTPSConnectionPool(host='<GITLAB_HOSTNAME>', port=443): Max retries exceeded with url: /<REPOSITORY>/info/refs?service=git-upload-pack (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f85816334c0>: Failed to resolve '<GITLAB_HOSTNAME>' ([Errno -2] Name or service not known)")): <urllib3.connection.HTTPSConnection object at 0x7f85816334c0>: Failed to resolve '<GITLAB_HOSTNAME>' ([Errno -2] Name or service not known): [Errno -2] Name or service not known
$ echo "192.168.57.131 <GITLAB_HOSTNAME>">> /etc/hosts
$ ping <GITLAB_HOSTNAME>
PING <GITLAB_HOSTNAME> (192.168.57.131) 56(84) bytes of data.
dvc get -v <GITLAB_REPOSITORY_URL> <PATH>
2024-09-25 12:52:05,038 DEBUG: v3.55.2 (deb), CPython 3.10.8 on Linux-3.10.0-1160.el7.x86_64-x86_64-with-glibc2.39
2024-09-25 12:52:05,038 DEBUG: command: get -v <GITLAB_REPOSITORY_URL> <PATH>
2024-09-25 12:52:05,187 DEBUG: Creating external repo <GITLAB_REPOSITORY_URL>@None
2024-09-25 12:52:05,187 DEBUG: erepo: git clone '<GITLAB_REPOSITORY_URL>' to a temporary dir
Cloning data-registry.git|█████████████████████████████████████████████████████████████████████████████████████████| Compressing |119/119 [00:00, 3.01obj/s]2024-09-25 13:00:47,786 ERROR: unexpected error - HTTPSConnectionPool(host='<GITLAB_HOSTNAME>', port=443): Max retries exceeded with url: /<REPOSITORY>/info/refs?service=git-upload-pack (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fc791d8f5b0>, 'Connection to <GITLAB_HOSTNAME> timed out. (connect timeout=None)')): HTTPSConnectionPool(host='<GITLAB_HOSTNAME>', port=443): Max retries exceeded with url: /<REPOSITORY>/info/refs?service=git-upload-pack (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fc791d8f5b0>, 'Connection to <GITLAB_HOSTNAME> timed out. (connect timeout=None)')): (<urllib3.connection.HTTPSConnection object at 0x7fc791d8f5b0>, 'Connection to <GITLAB_HOSTNAME> timed out. (connect timeout=None)'): [Errno 110] Connection timed out
Traceback (most recent call last):
File "urllib3/connection.py", line 199, in _new_conn
File "urllib3/util/connection.py", line 85, in create_connection
File "urllib3/util/connection.py", line 73, in create_connection
TimeoutError: [Errno 110] Connection timed out
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "urllib3/connectionpool.py", line 789, in urlopen
File "urllib3/connectionpool.py", line 490, in _make_request
File "urllib3/connectionpool.py", line 466, in _make_request
File "urllib3/connectionpool.py", line 1095, in _validate_conn
File "urllib3/connection.py", line 693, in connect
File "urllib3/connection.py", line 208, in _new_conn
urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0x7fc791d8f5b0>, 'Connection to <GITLAB_HOSTNAME> timed out. (connect timeout=None)')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "dulwich/client.py", line 2290, in _http_request
File "urllib3/_request_methods.py", line 135, in request
File "urllib3/_request_methods.py", line 182, in request_encode_url
File "urllib3/poolmanager.py", line 443, in urlopen
File "urllib3/connectionpool.py", line 873, in urlopen
File "urllib3/connectionpool.py", line 873, in urlopen
File "urllib3/connectionpool.py", line 873, in urlopen
File "urllib3/connectionpool.py", line 843, in urlopen
File "urllib3/util/retry.py", line 519, in increment
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='<GITLAB_HOSTNAME>', port=443): Max retries exceeded with url: /<REPOSITORY>/info/refs?service=git-upload-pack (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fc791d8f5b0>, 'Connection to <GITLAB_HOSTNAME> timed out. (connect timeout=None)'))
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "dvc/cli/__init__.py", line 211, in main
File "dvc/cli/command.py", line 41, in do_run
File "dvc/commands/get.py", line 30, in run
File "dvc/commands/get.py", line 37, in _get_file_from_repo
File "dvc/repo/get.py", line 45, in get
File "dvc/repo/__init__.py", line 302, in open
File "dvc/repo/open_repo.py", line 60, in open_repo
File "contextlib.py", line 79, in inner
File "dvc/repo/open_repo.py", line 23, in _external_repo
File "dvc/repo/open_repo.py", line 134, in _cached_clone
File "funcy/decorators.py", line 47, in wrapper
File "funcy/flow.py", line 246, in wrap_with
File "funcy/decorators.py", line 68, in __call__
File "dvc/repo/open_repo.py", line 198, in _clone_default_branch
File "dvc/scm.py", line 152, in clone
File "dvc/repo/experiments/utils.py", line 275, in fetch_all_exps
File "dvc/repo/experiments/utils.py", line 275, in <listcomp>
File "dvc/repo/experiments/utils.py", line 119, in iter_remote_refs
File "scmrepo/git/backend/dulwich/__init__.py", line 590, in iter_remote_refs
File "dulwich/client.py", line 2208, in get_refs
File "dulwich/client.py", line 2013, in _discover_references
File "scmrepo/git/backend/dulwich/client.py", line 50, in _http_request
File "dulwich/client.py", line 2298, in _http_request
dulwich.errors.GitProtocolError: HTTPSConnectionPool(host='<GITLAB_HOSTNAME>', port=443): Max retries exceeded with url: /<REPOSITORY>/info/refs?service=git-upload-pack (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fc791d8f5b0>, 'Connection to <GITLAB_HOSTNAME> timed out. (connect timeout=None)'))
2024-09-25 13:00:47,888 DEBUG: Version info for developers:
DVC version: 3.55.2 (deb)
-------------------------
Platform: Python 3.10.8 on Linux-3.10.0-1160.el7.x86_64-x86_64-with-glibc2.39
Subprojects:
Supports:
azure (adlfs = 2024.7.0, knack = 0.12.0, azure-identity = 1.18.0),
gdrive (pydrive2 = 1.20.0),
gs (gcsfs = 2024.9.0.post1),
hdfs (fsspec = 2024.9.0, pyarrow = 17.0.0),
http (aiohttp = 3.10.5, aiohttp-retry = 2.8.3),
https (aiohttp = 3.10.5, aiohttp-retry = 2.8.3),
oss (ossfs = 2023.12.0),
s3 (s3fs = 2024.9.0, boto3 = 1.35.23),
ssh (sshfs = 2024.6.0),
webdav (webdav4 = 0.10.0),
webdavs (webdav4 = 0.10.0),
webhdfs (fsspec = 2024.9.0)
Config:
Global: /root/.config/dvc
System: /etc/xdg/dvc
I still think there are some special cases where the proxy doesn't work
Yes, right. Or it still pick it up, but for whatever reason is trying to resolve the hostname while it should be doing that on the proxy machine (?).
Still not working
🤔
I really need a way to reproduce this locally. Then I'm pretty sure I can find the reason faster. If you have some idea how to run a proxy on my machine to experiment with it - that would help a lot.
$ docker run -d --name squid-container -e TZ=UTC -p 3128:3128 ubuntu/squid
$ docker run -it --rm luhuiguo/dvc bash
root@bd4ec17f398c:/workspace# dvc get -v https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
2024-09-26 12:25:27,589 DEBUG: v3.55.2 (deb), CPython 3.10.8 on Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.39
2024-09-26 12:25:27,589 DEBUG: command: get -v https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
2024-09-26 12:25:27,697 DEBUG: Creating external repo https://github.com/iterative/dataset-registry@None
2024-09-26 12:25:27,697 DEBUG: erepo: git clone 'https://github.com/iterative/dataset-registry' to a temporary dir
2024-09-26 12:25:49,445 DEBUG: Analytics is enabled.
2024-09-26 12:25:49,446 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmpl2zikt7c', '-v']
2024-09-26 12:25:49,452 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmpl2zikt7c', '-v'] with pid 234
2024-09-26 12:25:49,454 DEBUG: Removing '/tmp/tmpb241nh80dvc-clone'
2024-09-26 12:25:49,457 DEBUG: Removing '/tmp/tmpu6ugkwyrdvc-cache'
root@bd4ec17f398c:/workspace# rm -rf data
root@bd4ec17f398c:/workspace# echo "127.0.0.1 github.com">> /etc/hosts
root@bd4ec17f398c:/workspace# cat /etc/hosts
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.3 bd4ec17f398c
127.0.0.1 github.com
root@bd4ec17f398c:/workspace# git clone -v https://github.com/iterative/dataset-registry
Cloning into 'dataset-registry'...
fatal: unable to access 'https://github.com/iterative/dataset-registry/': Failed to connect to github.com port 443 after 0 ms: Couldn't connect to server
root@bd4ec17f398c:/workspace# dvc get https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
ERROR: failed to get 'get-started/data.xml' - SCM error: Failed to clone repo 'https://github.com/iterative/dataset-registry' to '/tmp/tmpiagkrm4pdvc-clone': HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /iterative/dataset-registry/info/refs?service=git-upload-pack (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f23ea1fc400>: Failed to establish a new connection: [Errno 111] Connection refused')): HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /iterative/dataset-registry/info/refs?service=git-upload-pack (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f23ea1fc400>: Failed to establish a new connection: [Errno 111] Connection refused')): <urllib3.connection.HTTPSConnection object at 0x7f23ea1fc400>: Failed to establish a new connection: [Errno 111] Connection refused: [Errno 111] Connection refused
root@bd4ec17f398c:/workspace# git config --global http.proxy http://10.3.12.8:3128
root@bd4ec17f398c:/workspace# git config --global https.proxy http://10.3.12.8:3128
root@bd4ec17f398c:/workspace# git config list
filter.lfs.smudge=git-lfs smudge -- %f
filter.lfs.process=git-lfs filter-process
filter.lfs.required=true
filter.lfs.clean=git-lfs clean -- %f
user.email=luhuiguo@gmail.com
user.name=luhuiguo
filter.lfs.clean=git-lfs clean -- %f
filter.lfs.smudge=git-lfs smudge -- %f
filter.lfs.process=git-lfs filter-process
filter.lfs.required=true
http.proxy=http://10.3.12.8:3128
https.proxy=http://10.3.12.8:3128
root@bd4ec17f398c:/workspace# git clone -v https://github.com/iterative/dataset-registry
Cloning into 'dataset-registry'...
POST git-upload-pack (175 bytes)
POST git-upload-pack (gzip 1202 to 636 bytes)
remote: Enumerating objects: 328, done.
remote: Counting objects: 100% (123/123), done.
remote: Compressing objects: 100% (84/84), done.
remote: Total 328 (delta 53), reused 61 (delta 38), pack-reused 205 (from 1)
Receiving objects: 100% (328/328), 50.37 KiB | 606.00 KiB/s, done.
Resolving deltas: 100% (85/85), done.
root@bd4ec17f398c:/workspace# dvc get -v https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
2024-09-26 12:39:41,732 DEBUG: v3.55.2 (deb), CPython 3.10.8 on Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.39
2024-09-26 12:39:41,732 DEBUG: command: get -v https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
2024-09-26 12:39:41,837 DEBUG: Creating external repo https://github.com/iterative/dataset-registry@None
2024-09-26 12:39:41,837 DEBUG: erepo: git clone 'https://github.com/iterative/dataset-registry' to a temporary dir
2024-09-26 12:39:43,436 ERROR: unexpected error - HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /iterative/dataset-registry/info/refs?service=git-upload-pack (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f86763ab4f0>: Failed to establish a new connection: [Errno 111] Connection refused')): HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /iterative/dataset-registry/info/refs?service=git-upload-pack (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f86763ab4f0>: Failed to establish a new connection: [Errno 111] Connection refused')): <urllib3.connection.HTTPSConnection object at 0x7f86763ab4f0>: Failed to establish a new connection: [Errno 111] Connection refused: [Errno 111] Connection refused
Traceback (most recent call last):
File "urllib3/connection.py", line 199, in _new_conn
File "urllib3/util/connection.py", line 85, in create_connection
File "urllib3/util/connection.py", line 73, in create_connection
ConnectionRefusedError: [Errno 111] Connection refused
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "urllib3/connectionpool.py", line 789, in urlopen
File "urllib3/connectionpool.py", line 490, in _make_request
File "urllib3/connectionpool.py", line 466, in _make_request
File "urllib3/connectionpool.py", line 1095, in _validate_conn
File "urllib3/connection.py", line 693, in connect
File "urllib3/connection.py", line 214, in _new_conn
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f86763ab4f0>: Failed to establish a new connection: [Errno 111] Connection refused
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "dulwich/client.py", line 2290, in _http_request
File "urllib3/_request_methods.py", line 135, in request
File "urllib3/_request_methods.py", line 182, in request_encode_url
File "urllib3/poolmanager.py", line 443, in urlopen
File "urllib3/connectionpool.py", line 873, in urlopen
File "urllib3/connectionpool.py", line 873, in urlopen
File "urllib3/connectionpool.py", line 873, in urlopen
File "urllib3/connectionpool.py", line 843, in urlopen
File "urllib3/util/retry.py", line 519, in increment
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /iterative/dataset-registry/info/refs?service=git-upload-pack (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f86763ab4f0>: Failed to establish a new connection: [Errno 111] Connection refused'))
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "dvc/cli/__init__.py", line 211, in main
File "dvc/cli/command.py", line 41, in do_run
File "dvc/commands/get.py", line 30, in run
File "dvc/commands/get.py", line 37, in _get_file_from_repo
File "dvc/repo/get.py", line 45, in get
File "dvc/repo/__init__.py", line 302, in open
File "dvc/repo/open_repo.py", line 60, in open_repo
File "contextlib.py", line 79, in inner
File "dvc/repo/open_repo.py", line 23, in _external_repo
File "dvc/repo/open_repo.py", line 134, in _cached_clone
File "funcy/decorators.py", line 47, in wrapper
File "funcy/flow.py", line 246, in wrap_with
File "funcy/decorators.py", line 68, in __call__
File "dvc/repo/open_repo.py", line 198, in _clone_default_branch
File "dvc/scm.py", line 152, in clone
File "dvc/repo/experiments/utils.py", line 275, in fetch_all_exps
File "dvc/repo/experiments/utils.py", line 275, in <listcomp>
File "dvc/repo/experiments/utils.py", line 119, in iter_remote_refs
File "scmrepo/git/backend/dulwich/__init__.py", line 590, in iter_remote_refs
File "dulwich/client.py", line 2208, in get_refs
File "dulwich/client.py", line 2013, in _discover_references
File "scmrepo/git/backend/dulwich/client.py", line 50, in _http_request
File "dulwich/client.py", line 2298, in _http_request
dulwich.errors.GitProtocolError: HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /iterative/dataset-registry/info/refs?service=git-upload-pack (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f86763ab4f0>: Failed to establish a new connection: [Errno 111] Connection refused'))
2024-09-26 12:39:43,464 DEBUG: Version info for developers:
DVC version: 3.55.2 (deb)
-------------------------
Platform: Python 3.10.8 on Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.39
Subprojects:
Supports:
azure (adlfs = 2024.7.0, knack = 0.12.0, azure-identity = 1.18.0),
gdrive (pydrive2 = 1.20.0),
gs (gcsfs = 2024.9.0.post1),
hdfs (fsspec = 2024.9.0, pyarrow = 17.0.0),
http (aiohttp = 3.10.5, aiohttp-retry = 2.8.3),
https (aiohttp = 3.10.5, aiohttp-retry = 2.8.3),
oss (ossfs = 2023.12.0),
s3 (s3fs = 2024.9.0, boto3 = 1.35.23),
ssh (sshfs = 2024.6.0),
webdav (webdav4 = 0.10.0),
webdavs (webdav4 = 0.10.0),
webhdfs (fsspec = 2024.9.0)
Config:
Global: /root/.config/dvc
System: /etc/xdg/dvc
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2024-09-26 12:39:43,469 DEBUG: Analytics is enabled.
2024-09-26 12:39:43,470 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmpk4n384y7', '-v']
2024-09-26 12:39:43,475 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmpk4n384y7', '-v'] with pid 342
root@bd4ec17f398c:/workspace# echo "$(sed '/github.com/d' /etc/hosts)" > /etc/hosts
root@bd4ec17f398c:/workspace# cat /etc/hosts
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.3 bd4ec17f398c
root@bd4ec17f398c:/workspace# git config list
filter.lfs.smudge=git-lfs smudge -- %f
filter.lfs.process=git-lfs filter-process
filter.lfs.required=true
filter.lfs.clean=git-lfs clean -- %f
user.email=luhuiguo@gmail.com
user.name=luhuiguo
filter.lfs.clean=git-lfs clean -- %f
filter.lfs.smudge=git-lfs smudge -- %f
filter.lfs.process=git-lfs filter-process
filter.lfs.required=true
http.proxy=http://10.3.12.8:3128
https.proxy=http://10.3.12.8:3128
root@bd4ec17f398c:/workspace# dvc get -v https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
2024-09-26 12:47:29,915 DEBUG: v3.55.2 (deb), CPython 3.10.8 on Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.39
2024-09-26 12:47:29,915 DEBUG: command: get -v https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
2024-09-26 12:47:30,025 DEBUG: Creating external repo https://github.com/iterative/dataset-registry@None
2024-09-26 12:47:30,025 DEBUG: erepo: git clone 'https://github.com/iterative/dataset-registry' to a temporary dir
2024-09-26 12:50:50,305 DEBUG: Analytics is enabled.
2024-09-26 12:50:50,305 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmp29_f6y8i', '-v']
2024-09-26 12:50:50,311 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmp29_f6y8i', '-v'] with pid 516
2024-09-26 12:50:50,312 DEBUG: Removing '/tmp/tmp_0xv8ud8dvc-clone'
2024-09-26 12:50:50,314 DEBUG: Removing '/tmp/tmp47ujcmg0dvc-cache'
@luhuiguo could you try to install scmrepo from this branch https://github.com/iterative/scmrepo/pull/378 and do some experiments
thanks for the reproducible env!
$ docker run -it --rm python bash
$ root@fcf01db756c8:/# pip install dvc
$ pip install git+https://github.com/iterative/scmrepo.git@fix-fetch-exps-under-proxy
Collecting git+https://github.com/iterative/scmrepo.git@fix-fetch-exps-under-proxy
Cloning https://github.com/iterative/scmrepo.git (to revision fix-fetch-exps-under-proxy) to /tmp/pip-req-build-v2fbye8o
.......
Successfully built scmrepo
Installing collected packages: scmrepo
Attempting uninstall: scmrepo
Found existing installation: scmrepo 3.3.7
Uninstalling scmrepo-3.3.7:
Successfully uninstalled scmrepo-3.3.7
Successfully installed scmrepo-3.3.8.dev4+gf2e18e2
root@fcf01db756c8:/# echo "127.0.0.1 github.com">> /etc/hosts
root@fcf01db756c8:/# git config --global http.proxy http://10.3.12.8:3128
root@fcf01db756c8:/# git config --global https.proxy http://10.3.12.8:3128
root@fcf01db756c8:/# dvc get -v https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
2024-09-27 01:48:59,097 DEBUG: v3.55.2 (pip), CPython 3.12.6 on Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.36
2024-09-27 01:48:59,097 DEBUG: command: /usr/local/bin/dvc get -v https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
2024-09-27 01:48:59,285 DEBUG: Creating external repo https://github.com/iterative/dataset-registry@None
2024-09-27 01:48:59,285 DEBUG: erepo: git clone 'https://github.com/iterative/dataset-registry' to a temporary dir
2024-09-27 01:49:09,292 DEBUG: Analytics is enabled.
2024-09-27 01:49:09,323 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmp5dtejb1u', '-v']
2024-09-27 01:49:09,328 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmp5dtejb1u', '-v'] with pid 219
2024-09-27 01:49:09,330 DEBUG: Removing '/tmp/tmpbjw1dcxjdvc-clone'
2024-09-27 01:49:09,333 DEBUG: Removing '/tmp/tmp6mr_z8rfdvc-cache'
Okay, good. I'll try to get to it to add tests and release asap. Thanks for your help reproducing this.
Bug Report
get/import : Name or service not known
Description
I have a situation where my computer is behind a proxy, and needs to access a Git repository outside of the proxy network. When running dvc get/import behind my proxy, my file is not downloaded and I get the following error: [Errno -2] Name or service not known.
Configure Git to use a proxy
git clone , dvc pull ... Everything is OK
But when I want to download file tracked by DVC into other workspace
Reproduce
Expected
dvc get/import use the git proxy config
Environment information
Output of
dvc doctor
:Additional Information (if any):