pypa / bandersnatch

A PyPI mirror client according to PEP 381 http://www.python.org/dev/peps/pep-0381/
Academic Free License v3.0
447 stars 141 forks source link

Issue with Missing Packages and Incomplete Downloads in Bandersnatch 6.5.0 with latest_release plugin #1784

Open MatthewMyunghaKim opened 1 month ago

MatthewMyunghaKim commented 1 month ago

Hi there,

Thank you very much for offering a way to mirror the PyPI repository!

I am mirroring PyPI using Bandersnatch version 6.5.0. Initially, I mirrored all the packages with latest_release but without an allowlist. However, I noticed that some packages were missing, and some packages only downloaded a few .tar.gz files without the actual .whl files.

Therefore, I tried some quick test mirroring with an allowlist. This is the bandersnatch.conf file I am testing with:

I observed that the packages paramiko and pytz did not download from this test mirroring. After removing the [latest_release] section, all versions of paramiko and pytz were downloaded, but other packages already downloaded did not download their other older versions.

So, why were the paramiko and pytz packages missing from the [allowlist] approach (this is the main issue)? And why did they only download paramiko and pytz after removing the [latest_release] plugins?

Did I miss something?

[mirror]
directory=/srv/pypi
master = https://pypi.org
workers = 1
hash-index = false
stop-on-error = true
timeout = 300.0
json = true
storage-backend = filesystem
allowlist_package_formats = bdist_wheel

[plugins]
enabled =
    blocklist_project
    blocklist_release
    allowlist_release
    allowlist_project
    exclude_platform
    latest_release
;    custom_filter

[blocklist]
platforms =
    windows
    macos
    freebsd
    py2.4
    py2.5
    py2.6
    py2.7
    py3.1
    py3.2
    py3.3
    py3.4
    py3.5
    py3.6
    py3.7
    py3.8
    py3.9
    py3.10

; currently only downloading ver 3.11 and 3.12

[allowlist]
packages =
    numpy
    pandas
    unittest
    cryptography
    paramiko
    pytz
    safety
    authlib
    pysftp

[latest_release]
    keep = 3
    sort_by = [version|time]

_Before removing [latestrelease]:

2024-08-01 21:45:36,600 INFO: Downloading: https://files.pythonhosted.org/packages/35/9d/208febf8c4eb5c1d9ea3314d52d8bd415fd0ef0dd66bb24cc5bdbc8fa71a/pandas-2.2.2-cp312-cp312-musllinux_1_1_aarch64.whl (mirror.py:876) 2024-08-01 21:45:38,012 INFO: Downloading: https://files.pythonhosted.org/packages/99/d1/2d9bd05def7a9e08a92ec929b5a4c8d5556ec76fae22b0fa486cbf33ea63/pandas-2.2.2-cp312-cp312-musllinux_1_1_x86_64.whl (mirror.py:876) 2024-08-01 21:45:39,022 INFO: Downloading: https://files.pythonhosted.org/packages/88/d9/ecf715f34c73ccb1d8ceb82fc01cd1028a65a5f6dbc57bfa6ea155119058/pandas-2.2.2.tar.gz (mirror.py:876) 2024-08-01 21:45:39,359 INFO: Storing index page(s): pandas - in /srv/pypi/web/simple/pandas (mirror.py:699) 2024-08-01 21:45:39,360 INFO: Fetching metadata for package: paramiko (serial 21112999) (package.py:58) 2024-08-01 21:45:39,397 INFO: Fetching metadata for package: pysftp (serial 2204783) (package.py:58) 2024-08-01 21:45:39,704 INFO: Downloading: https://files.pythonhosted.org/packages/52/2b/4f4c5dc77543f443e528d42da73ec7ba4157f69086f1db0dc3f5e7d28d90/pysftp-0.2.7.tar.gz (mirror.py:876) 2024-08-01 21:45:40,012 INFO: Downloading: https://files.pythonhosted.org/packages/fd/35/6212ecdec169c9dbbf23ae7e089a78fbe70a7b04830e3c2a8ac2bb1c8ca1/pysftp-0.2.8.tar.gz (mirror.py:876) 2024-08-01 21:45:40,241 INFO: Downloading: https://files.pythonhosted.org/packages/36/60/45f30390a38b1f92e0a8cf4de178cd7c2bc3f874c85430e40ccf99df8fe7/pysftp-0.2.9.tar.gz (mirror.py:876) 2024-08-01 21:45:40,251 INFO: Storing index page(s): pysftp - in /srv/pypi/web/simple/pysftp (mirror.py:699) 2024-08-01 21:45:40,253 INFO: Fetching metadata for package: pytz (serial 21700107) (package.py:58) 2024-08-01 21:45:40,291 INFO: Fetching metadata for package: safety (serial 24355884) (package.py:58) .... 2024-08-01 21:45:38,012 INFO: Downloading: https://files.pythonhosted.org/packages/99/d1/2d9bd05def7a9e08a92ec929b5a4c8d5556ec76fae22b0fa486cbf33ea63/pandas-2.2.2-cp312-cp312-musllinux_1_1_x86_64.whl (mirror.py:876) 2024-08-01 21:45:39,022 INFO: Downloading: https://files.pythonhosted.org/packages/88/d9/ecf715f34c73ccb1d8ceb82fc01cd1028a65a5f6dbc57bfa6ea155119058/pandas-2.2.2.tar.gz (mirror.py:876) 2024-08-01 21:45:39,359 INFO: Storing index page(s): pandas - in /srv/pypi/web/simple/pandas (mirror.py:699) 2024-08-01 21:45:39,360 INFO: Fetching metadata for package: paramiko (serial 21112999) (package.py:58) 2024-08-01 21:45:39,397 INFO: Fetching metadata for package: pysftp (serial 2204783) (package.py:58) 2024-08-01 21:45:39,704 INFO: Downloading: https://files.pythonhosted.org/packages/52/2b/4f4c5dc77543f443e528d42da73ec7ba4157f69086f1db0dc3f5e7d28d90/pysftp-0.2.7.tar.gz (mirror.py:876) 2024-08-01 21:45:40,012 INFO: Downloading: https://files.pythonhosted.org/packages/fd/35/6212ecdec169c9dbbf23ae7e089a78fbe70a7b04830e3c2a8ac2bb1c8ca1/pysftp-0.2.8.tar.gz (mirror.py:876) 2024-08-01 21:45:40,241 INFO: Downloading: https://files.pythonhosted.org/packages/36/60/45f30390a38b1f92e0a8cf4de178cd7c2bc3f874c85430e40ccf99df8fe7/pysftp-0.2.9.tar.gz (mirror.py:876) 2024-08-01 21:45:40,251 INFO: Storing index page(s): pysftp - in /srv/pypi/web/simple/pysftp (mirror.py:699) 2024-08-01 21:45:40,253 INFO: Fetching metadata for package: pytz (serial 21700107) (package.py:58) 2024-08-01 21:45:40,291 INFO: Fetching metadata for package: safety (serial 24355884) (package.py:58) 2024-08-01 21:45:40,534 INFO: Downloading: https://files.pythonhosted.org/packages/a7/20/4ddd57eaaf79bd541116cca461b43d7c3c49dccca89386245725eb982700/safety-3.2.2-py3-none-any.whl (mirror.py:876) 2024-08-01 21:45:40,851 INFO: Downloading: https://files.pythonhosted.org/packages/0b/62/7a74c09a945a2ac679042dd948768d85ecd39d6d85d91b3d6fa2dc550db6/safety-3.2.2.tar.gz (mirror.py:876) 2024-08-01 21:45:41,163 INFO: Downloading: https://files.pythonhosted.org/packages/36/ca/763d608fc139207ca42367d13b778c7c0e267918a7d61a39ca780267810b/safety-3.2.3-py3-none-any.whl (mirror.py:876) 2024-08-01 21:45:41,184 INFO: Downloading: https://files.pythonhosted.org/packages/ff/23/7bb834b5f8ea12c120086c8a53395f7ffe47fca2c2a6b40640b90400ae03/safety-3.2.3.tar.gz (mirror.py:876) 2024-08-01 21:45:41,463 INFO: Downloading: https://files.pythonhosted.org/packages/59/6c/bf6fcfbf1daf5add740cd7f276e6c5f6a383e10f12f08c47bc321a076e4d/safety-3.2.4-py3-none-any.whl (mirror.py:876) 2024-08-01 21:45:41,483 INFO: Downloading: https://files.pythonhosted.org/packages/af/bb/723f294df65939d61cd35cba6c9c6c95bd2ce7f3822a45ba9e836cf034e3/safety-3.2.4.tar.gz (mirror.py:876) 2024-08-01 21:45:41,506 INFO: Storing index page(s): safety - in /srv/pypi/web/simple/safety (mirror.py:699) 2024-08-01 21:45:41,507 INFO: Fetching metadata for package: unittest (serial 187609) (package.py:58) 2024-08-01 21:45:41,771 INFO: Downloading: https://files.pythonhosted.org/packages/51/58/e37e078a2e02093f86cb44bf2acda0e2b94a575c2df7fcefda6a3eefdd04/unittest2-0.0.0.tar.gz (mirror.py:876) 2024-08-01 21:45:42,340 INFO: Storing index page(s): unittest - in /srv/pypi/web/simple/unittest (mirror.py:699) 2024-08-01 21:45:42,342 INFO: Generating global index page. (simple.py:264) 2024-08-01 21:45:42,443 INFO: 7 packages had changes (mirror.py:991) 2024-08-01 21:45:42,443 INFO: Writing diff file to mirrored-files (mirror.py:1001)

This is the downloaded package list (pypi/web/simple/):

authlib
cryptography
numpy
pandas
pysftp
safety
unittest

_After removing [latestrelease]:

2024-08-01 22:43:20,706 INFO: Status file /srv/pypi/status missing. Starting over. (mirror.py:567) 2024-08-01 22:43:20,706 INFO: Syncing with https://pypi.org. (mirror.py:58) 2024-08-01 22:43:20,706 INFO: Current mirror serial: 0 (mirror.py:279) 2024-08-01 22:43:20,706 INFO: Resuming interrupted sync from local todo list. (mirror.py:286) 2024-08-01 22:43:20,706 INFO: Package 'paramiko' is allowlisted (allowlist_name.py:91) 2024-08-01 22:43:20,706 INFO: Package 'pytz' is allowlisted (allowlist_name.py:91) 2024-08-01 22:43:20,706 INFO: Trying to reach serial: 24382546 (mirror.py:311) 2024-08-01 22:43:20,706 INFO: 2 packages to sync. (mirror.py:313) 2024-08-01 22:43:20,706 INFO: No metadata filters are enabled. Skipping metadata filtering (mirror.py:77) 2024-08-01 22:43:20,706 INFO: Fetching metadata for package: paramiko (serial 21112999) (package.py:58) 2024-08-01 22:43:20,757 INFO: Downloading: https://files.pythonhosted.org/packages/65/99/f8bdeb157d87dd6924ca9042f768e34912bcfd8d0a814fd18beef2b30b63/paramiko-1.0.zip (mirror.py:876) 2024-08-01 22:43:21,080 INFO: Downloading: https://files.pythonhosted.org/packages/4d/e2/9a6715acbac15c781276b67e6100ba118958b3c59bd1bd1d1e30cef40e0e/paramiko-1.1.zip (mirror.py:876) 2024-08-01 22:43:21,374 INFO: Downloading: https://files.pythonhosted.org/packages/10/4b/21ae5fe869a724f59cd036fafac3477a9dc195f398dacb3b66eb52afd90e/paramiko-1.10.0.tar.gz (mirror.py:876) 2024-08-01 22:43:21,700 INFO: Downloading: https://files.pythonhosted.org/packages/bc/f0/204504e800922bbfb6fdc8013d07f52bb8b1f84e611e2877806a43d5d129/paramiko-1.10.1.tar.gz (mirror.py:876) 2024-08-01 22:43:21,994 INFO: Downloading: https://files.pythonhosted.org/packages/d3/e8/6155457232158f3336e7ea569905bb5f2b1c84a52fea6941707c2581e0a0/paramiko-1.10.2.tar.gz (mirror.py:876) ........

2024-08-01 22:44:46,664 INFO: Fetching metadata for package: pytz (serial 21700107) (package.py:58) 2024-08-01 22:44:46,723 INFO: Downloading: https://files.pythonhosted.org/packages/3f/9f/22fd479e19353f6645a61577885f159f7077a4d657500f90eef54c24a46f/pytz-2004a.tar.gz (mirror.py:876) 2024-08-01 22:44:47,066 INFO: Downloading: https://files.pythonhosted.org/packages/de/c2/34a1626156e249a6f76978dc0e8d97b159ea69d0c1884eacf88e98c0db97/pytz-2004b.tar.gz (mirror.py:876) 2024-08-01 22:44:47,341 INFO: Downloading: https://files.pythonhosted.org/packages/16/a4/0231bac40333c820ca731bf100c99dbae1d66f26d3cd21de9f6303ab5c79/pytz-2004b.2.tar.gz (mirror.py:876) 2024-08-01 22:44:47,679 INFO: Downloading: https://files.pythonhosted.org/packages/03/98/276bd2f074e24833bb472cefd3b535ed6aed7ffa7029218a64fae5784b61/pytz-2005a.tar.gz (mirror.py:876) 2024-08-01 22:44:47,911 INFO: Downloading: https://files.pythonhosted.org/packages/bf/8b/c84177045c8a2e09adfeb327bf24874a14a3d713de9956a27f4bfd0ec998/pytz-2005r.tar.bz2 (mirror.py:876) ........

This is the downloaded package list (pypi/web/simple/):

authlib
cryptography
numpy
pandas
paramiko
pysftp
pytz
safety
unittest

(Now it includes paramiko and pytz)

Looking forward to your opinion and suggestions. Thank you.

MatthewMyunghaKim commented 1 month ago

From the previous issue report #1784: It appears they are similar; however, the paramiko package follows PEP 440 standards but still failed to download using the [latest_release] plug-in option, whereas pytz does not follow the PEP 440 standard.

cooperlees commented 1 month ago

Hi,

Thanks for the report. To truly have any idea I would need the debug logs for these two packages when using the [latest_release] pluging and without. So please run with:

bandersnatch --debug mirror ...

And let's set to stop on error in the config file

stop-on-error = true

But it does seem you have that set in your example config. I also don't know how much debug logging the [latest_release] plugin has. I am also guessing it's not failing on any error but simply silently passing valid version you feel it should be fetching?

I am guessing we'll probably have to add debug logging and more error output to pinpoint this bug and expect it to be in the plugin itself based on removing it running allows all the versions to download. I'm short on time so will take any help here debugging this.

MatthewMyunghaKim commented 1 month ago

Thanks for your response. I've attached the full log file.

I ran command with --debug option. In this case, I could see the paramiko files downloaded.

2024-08-07 08:23:28,826 INFO: Fetching metadata for package: paramiko (serial 21112999) (package.py:58) 2024-08-07 08:23:28,826 DEBUG: Getting /pypi/paramiko/json (serial 21112999) (master.py:127) 2024-08-07 08:23:29,113 DEBUG: Writing temporary file /srv/pypi/web/json/.paramiko.26oum5w1 to target destination: /srv/pypi/web/json/paramiko (filesystem.py:94) 2024-08-07 08:23:29,114 DEBUG: Package paramiko==0.1-bulbasaur has an invalid version (blocklist_name.py:173) 2024-08-07 08:23:29,114 DEBUG: Package paramiko==0.1-bulbasaur has an invalid version (allowlist_name.py:246) 2024-08-07 08:23:29,114 DEBUG: Package paramiko==0.1-charmander has an invalid version (blocklist_name.py:173) 2024-08-07 08:23:29,114 DEBUG: Package paramiko==0.1-charmander has an invalid version (allowlist_name.py:246) 2024-08-07 08:23:29,114 DEBUG: Package paramiko==0.9-doduo has an invalid version (blocklist_name.py:173) 2024-08-07 08:23:29,114 DEBUG: Package paramiko==0.9-doduo has an invalid version (allowlist_name.py:246) 2024-08-07 08:23:29,114 DEBUG: Package paramiko==0.9-eevee has an invalid version (blocklist_name.py:173) 2024-08-07 08:23:29,114 DEBUG: Package paramiko==0.9-eevee has an invalid version (allowlist_name.py:246) 2024-08-07 08:23:29,114 DEBUG: Package paramiko==0.9-fearow has an invalid version (blocklist_name.py:173) 2024-08-07 08:23:29,114 DEBUG: Package paramiko==0.9-fearow has an invalid version (allowlist_name.py:246) 2024-08-07 08:23:29,114 DEBUG: Package paramiko==0.9-gyarados has an invalid version (blocklist_name.py:173) 2024-08-07 08:23:29,114 DEBUG: Package paramiko==0.9-gyarados has an invalid version (allowlist_name.py:246) 2024-08-07 08:23:29,114 DEBUG: Package paramiko==0.9-horsea has an invalid version (blocklist_name.py:173) 2024-08-07 08:23:29,114 DEBUG: Package paramiko==0.9-horsea has an invalid version (allowlist_name.py:246) 2024-08-07 08:23:29,114 DEBUG: Package paramiko==0.9-ivysaur has an invalid version (blocklist_name.py:173) 2024-08-07 08:23:29,114 DEBUG: Package paramiko==0.9-ivysaur has an invalid version (allowlist_name.py:246) 2024-08-07 08:23:29,114 DEBUG: MATCH: Release paramiko==1.0 matches specifier (allowlist_name.py:252) 2024-08-07 08:23:29,114 DEBUG: MATCH: Release paramiko==1.1 matches specifier (allowlist_name.py:252) 2024-08-07 08:23:29,115 DEBUG: MATCH: Release paramiko==1.10.0 matches specifier (allowlist_name.py:252) 2024-08-07 08:23:29,115 DEBUG: MATCH: Release paramiko==1.10.1 matches specifier (allowlist_name.py:252) 2024-08-07 08:23:29,116 DEBUG: MATCH: Release paramiko==1.10.2 matches specifier (allowlist_name.py:252) 2024-08-07 08:23:29,116 DEBUG: MATCH: Release paramiko==1.10.3 matches specifier (allowlist_name.py:252) ......... 2024-08-07 08:23:29,144 DEBUG: MATCH: Release paramiko==3.3.0 matches specifier (allowlist_name.py:252) 2024-08-07 08:23:29,144 DEBUG: MATCH: Release paramiko==3.3.1 matches specifier (allowlist_name.py:252) 2024-08-07 08:23:29,144 DEBUG: MATCH: Release paramiko==3.4.0 matches specifier (allowlist_name.py:252) 2024-08-07 08:23:29,145 INFO: Downloading: https://files.pythonhosted.org/packages/55/62/6cf369c3faaba30287871af7754977770aa77a402e9850de5d2bc2542ec6/paramiko-3.3.0-py3-none-any.whl (mirror.py:876) 2024-08-07 08:23:29,145 DEBUG: Getting https://files.pythonhosted.org/packages/55/62/6cf369c3faaba30287871af7754977770aa77a402e9850de5d2bc2542ec6/paramiko-3.3.0-py3-none-any.whl (serial None) (master.py:127) 2024-08-07 08:23:29,170 DEBUG: Writing temporary file /srv/pypi/web/packages/55/62/6cf369c3faaba30287871af7754977770aa77a402e9850de5d2bc2542ec6/.paramiko-3.3.0-py3-none-any.whl.hqoqx_0j to target destination: /srv/pypi/web/packages/55/62/6cf369c3faaba30287871af7754977770aa77a402e9850de5d2bc2542ec6/paramiko-3.3.0-py3-none-any.whl (filesystem.py:94) .........

(myvenv) matthew@matilda:/etc$ ls -al /srv/pypi/web/simple/ total 52 drwxr-xr-x 10 root root 4096 Aug 7 08:23 . drwxr-xr-x 7 root root 4096 Aug 7 08:21 .. drwxr-xr-x 2 root root 4096 Aug 7 08:21 authlib drwxr-xr-x 2 root root 4096 Aug 7 08:21 cryptography -rw-r--r-- 1 root root 480 Aug 7 08:23 index.html -rw-r--r-- 1 root root 480 Aug 7 08:23 index.v1_html -rw-r--r-- 1 root root 235 Aug 7 08:23 index.v1_json drwxr-xr-x 2 root root 4096 Aug 7 08:22 numpy drwxr-xr-x 2 root root 4096 Aug 7 08:23 pandas drwxr-xr-x 2 root root 4096 Aug 7 08:23 paramiko drwxr-xr-x 2 root root 4096 Aug 7 08:23 pysftp drwxr-xr-x 2 root root 4096 Aug 7 08:23 safety drwxr-xr-x 2 root root 4096 Aug 7 08:23 unittest (myvenv) matthew@matilda:/etc$ python --version Python 3.12.3 (myvenv) matthew@matilda:/etc$

This is for pytz: 2024-08-07 08:23:31,073 INFO: Fetching metadata for package: pytz (serial 21700107) (package.py:58) 2024-08-07 08:23:31,073 DEBUG: Getting /pypi/pytz/json (serial 21700107) (master.py:127) 2024-08-07 08:23:31,396 DEBUG: Writing temporary file /srv/pypi/web/json/.pytz.zok4tgv6 to target destination: /srv/pypi/web/json/pytz (filesystem.py:94) 2024-08-07 08:23:31,398 DEBUG: MATCH: Release pytz==2004a0 matches specifier (allowlist_name.py:252) bandersnatch.log

MatthewMyunghaKim commented 1 month ago

I don't know what the difference is "have a --debug option or not" for the paramiko package?

My suggestion is that if the packages do not follow the PEP 440 standards, can we add an option to either ignore the packages or download all versions?

cooperlees commented 4 weeks ago

I'm happy for a PR to allow people to download non PEP440 versions when using this plugin and even check that we do the same without any plugins (explicit tests would be great here).

My time is limited but if someone can beat me I'll review. Generally these packages are old, so I would have thought no one would want them, but we are mirroring software, not smart decision software :)

allamiro commented 3 weeks ago

I don't know what the difference is "have a --debug option or not" for the paramiko package?

My suggestion is that if the packages do not follow the PEP 440 standards, can we add an option to either ignore the packages or download all versions?

A balanced option would be to provide users with the choice through configuration but default to ignoring non-PEP 440 versions.