pypa / bandersnatch

A PyPI mirror client according to PEP 381 http://www.python.org/dev/peps/pep-0381/
Academic Free License v3.0
448 stars 141 forks source link

Add TODO file cleanup to avoid a single package blocking the entire sync process #1434

Open 89ao opened 1 year ago

89ao commented 1 year ago

As shown in the following log, I initially found that the packages "oreo" and "spanishconjugator" were not updated, but after checking the log, I found that "oreo" was missing and "spanishconjugator" failed to pass the verification. The problem is that the failure of the subtasks' updates should not hinder the overall task's operation. Otherwise, the task will be stuck in a loop at these two packages forever.

# cat /yum/pip/todo
17825673
oreo4 17825509
spanishconjugator 17825562
2023-04-24 14:05:42 bandersnatch.package: INFO Fetching metadata for package: oreo4 (serial 17825509)
2023-04-24 14:05:42 bandersnatch.package: INFO Fetching metadata for package: spanishconjugator (serial 17825562)
2023-04-24 14:05:42 bandersnatch.package: ERROR Stale serial for package spanishconjugator - Attempt 1
2023-04-24 14:05:42 bandersnatch.package: INFO oreo4 no longer exists on PyPI
2023-04-24 14:05:43 bandersnatch.package: INFO Fetching metadata for package: spanishconjugator (serial 17825562)
2023-04-24 14:05:43 bandersnatch.package: ERROR Stale serial for package spanishconjugator - Attempt 2
2023-04-24 14:05:45 bandersnatch.package: INFO Fetching metadata for package: spanishconjugator (serial 17825562)
2023-04-24 14:05:45 bandersnatch.package: ERROR Stale serial for package spanishconjugator - Attempt 3
2023-04-24 14:05:45 bandersnatch.package: ERROR Stale serial for spanishconjugator (17825562) not updating. Giving up.
2023-04-24 14:05:45 bandersnatch.mirror: ERROR Error syncing package: spanishconjugator@17825562
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/bandersnatch/package.py", line 61, in update_metadata
    self._metadata = await master.get_package_metadata(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/bandersnatch/master.py", line 220, in get_package_metadata
    metadata_response = await metadata_generator.asend(None)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/bandersnatch/master.py", line 138, in get
    await self.check_for_stale_cache(path, required_serial, got_serial)
  File "/usr/local/lib/python3.11/site-packages/bandersnatch/master.py", line 117, in check_for_stale_cache
    raise StalePage(
bandersnatch.master.StalePage: Expected PyPI serial 17825562 for request https://pypi.org//pypi/spanishconjugator/json but got 17825558. We can no longer issue a PURGE. Report issue to PyPA Warehouse GitHub if it persists ...

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/bandersnatch/mirror.py", line 129, in package_syncer
    await package.update_metadata(self.master, attempts=3)
  File "/usr/local/lib/python3.11/site-packages/bandersnatch/package.py", line 86, in update_metadata
    raise error_class(package_name=self.name, attempts=attempts)
bandersnatch.errors.StaleMetadata: Stale serial for spanishconjugator after 3 attempts
2023-04-24 14:05:45 bandersnatch.simple: INFO Generating global index page.
2023-04-24 14:05:49 bandersnatch.mirror: INFO 0 packages had changes

Please help me resolve these issues.

cooperlees commented 1 year ago

Howdy - Can we also get your bandersnatch.conf added to the PR so that your usage can be confirmed / added to tests etc. etc. please.

89ao commented 1 year ago

@cooperlees here is my bandersnach.conf

[mirror]
directory = /opt/bandersnatch
storage-backend = filesystem
master = https://pypi.org/
json = true
timeout = 300
workers = 3
hash-index = false
stop-on-error = false
delete-packages = true
compare-method = stat
log-config = /conf/bandersnatch-log.conf

[plugins]
enabled =
    blocklist_project
    blocklist_release
    regex_project

[blocklist]
packages =
    uselesscapitalquiz
    tf-nightly-gpu
    tf-nightly
    tensorflow-io-nightly
    tf-nightly-cpu
    pyagrum-nightly
    appium
[filter_regex]
packages =
    .+-nightly.*
89ao commented 1 year ago

issue happens again,when i check banderlogfile.log,It seems to be no error,but the /yum/pip/todo file always keep these "no longer exist" packages ,this leads to the inability to perform new synchronization tasks.

Recently our pypi repo has been out of sync many times. Please help me solve this problem. @cooperlees

bandersnatch version:6.0.0

[root@VM_21_104_centos /data/home/motorao/bandersnatch]# tail -n 10 /yum/pip/banderlogfile.log
2023-05-31 19:02:01 bandersnatch.package: INFO zhanlan1 no longer exists on PyPI
2023-05-31 19:02:01 bandersnatch.package: INFO Fetching metadata for package: zlkj (serial 18121307)
2023-05-31 19:02:01 bandersnatch.package: INFO zhanlanpkg no longer exists on PyPI
2023-05-31 19:02:01 bandersnatch.package: INFO Fetching metadata for package: zwhrce (serial 18119235)
2023-05-31 19:02:01 bandersnatch.package: INFO zhanlanu no longer exists on PyPI
2023-05-31 19:02:01 bandersnatch.package: INFO zlkj no longer exists on PyPI
2023-05-31 19:02:01 bandersnatch.package: INFO zwhrce no longer exists on PyPI
2023-05-31 19:02:01 bandersnatch.simple: INFO Generating global index page.
2023-05-31 19:02:06 bandersnatch.mirror: INFO 0 packages had changes
2023-05-31 19:02:06 bandersnatch.mirror: INFO Writing diff file to mirrored-files

todo.zip

cooperlees commented 1 year ago

So, what I believe is happening here is you've ran a sync and it failed somehow, but during the time between your sync and the next, some of these packages in your todo file got deleted from PyPI. So it seems we get stuck into this loop of always trying to see if it has "come back".

I think we could introduce behavior to get out of this loop. But since this has been the behavior for a long time I think we have to gate it via a config option or CLI option.

I'd accept adding a config/CLI option (like --cleanup-todo) to allow deleting of these packages from the todo list if they raise PackageNotFound (aka, are not found on PyPI anymore).

Your manual workaround for now is to just remove all the "no longer exists on PyPI" packages from your todo file or just remove your todo file.

89ao commented 1 year ago

Looks good to me , looking forward to the update. :)