pypa / bandersnatch

A PyPI mirror client according to PEP 381 http://www.python.org/dev/peps/pep-0381/
Academic Free License v3.0
454 stars 141 forks source link

Fails on the same package when using proxy #510

Closed MikeHofmann closed 4 years ago

MikeHofmann commented 4 years ago

I'm behind a firewall and can't access pypi.org directly. Instead i use a squid3 with squidguard proxy to access pypi.org. We provide the proxy-address via env variable like so:

https_proxy=http://10.0.0.1:3128
http_proxy=http://10.0.0.1:3128

please note, the proxy uses http (without s)

Mirroring now always fails each time on the same set of packages, with for example:

2020-05-10 17:49:18,349 INFO: Downloading: https://files.pythonhosted.org/packages/5c/bf/87cbf96fe970f48082c7c919da12dc2060a6d29d67a5b9161b7209b3c59c/OctoBot_Trading-1.6.5-py3.7-win32.egg
2020-05-10 17:49:18,376 ERROR: Error syncing package: operations@1199117
Traceback (most recent call last):
  File "/bandersnatch/src/bandersnatch/package.py", line 144, in sync
    metadata_response = await metadata_generator.asend(None)
  File "/bandersnatch/src/bandersnatch/master.py", line 105, in get
    async with self.session.get(path, timeout=timeout, **kw) as r:
  File "/usr/local/lib/python3.8/site-packages/aiohttp/client.py", line 1012, in __aenter__
    self._resp = await self._coro
  File "/usr/local/lib/python3.8/site-packages/aiohttp/client.py", line 504, in _request
    await resp.start(conn)
  File "/usr/local/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 860, in start
    self._continue = None
  File "/usr/local/lib/python3.8/site-packages/aiohttp/helpers.py", line 596, in __exit__
    raise asyncio.TimeoutError from None
asyncio.exceptions.TimeoutError

i can however download the packages with for example wget without any problems.

additional info:

bandersnatch --version
bandersnatch 4.0.3

grep -Ev "^(;|$)" /conf/bandersnatch.conf 
[mirror]
directory = /mnt/pypi
json = false
master = https://pypi.org
timeout = 10
workers = 3
hash-index = false
stop-on-error = false
verifiers = 3
[blacklist]
packages =
    example1
    example2
cooperlees commented 4 years ago

I’m uncertain how I can action this. I could setup a proxy and see if my mirroring fails, but I would expect it to just work and be hard to reproduce.

The same files says to me your proxy is maybe caching the upstream errors it got?

I’d suggest trying:

P.s. I’d also suggest taking the example blacklist out of your config :)

cooperlees commented 4 years ago

Also, the tips I gave here to try tweak might help too: https://github.com/pypa/bandersnatch/issues/511#issuecomment-626748283

MikeHofmann commented 4 years ago

I moved to a host with a direct internet connection. The base problem seems to persist, even if there is no proxy involved.

The same files says to me your proxy is maybe caching the upstream errors it got?

The proxy is only for whitelisting specific sites and doesn't do any caching.

1 or 2 workers - do you still fail?

switched to 1 worker

Use tcpdump to compare wget (working) behavior and aiohttp...

working on it.

I changed my config a little:

root@5a77221e3b7d:/# grep -Ev "^(;|$)" /etc/bandersnatch.conf 
[mirror]
directory = /mnt/pypi
json = false
master = https://pypi.org
timeout = 10
workers = 1
hash-index = false
stop-on-error = false
verifiers = 3
[plugins]
enabled =
    whitelist_project
[whitelist]
packages =
    tf-nightly

so it just tries to mirror tf-nightly (the operations package, seems to be gone from pypi.org).

I got this running inside a docker container (based on ubuntu:bionic). This allows me to test this on a local machine, as well from our internal site using mostly the same conditions (except for the proxy).

MikeHofmann commented 4 years ago

ok, i believe i found the problem. I changed timeout to 120 in bandersnatch.conf and now the mirror runs. As it doesn't have anything to do with our proxy, i'm closing this but will make a contribution in #511