internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.13k stars 1.34k forks source link

Data Dumps not auto-generated for June 2024 #9521

Closed neilt closed 2 months ago

neilt commented 3 months ago

Problem

As of today July 3, https://archive.org/details/ol_exports?sort=-publicdate does not show a June 2024 data dump.

And https://openlibrary.org/data/ol_dump_latest.txt.gz is still downloading ol_dump_2024-05-31.txt.gz

The dumps usually generate by the 1st or 2nd day of the next month.

Reproducing the bug

  1. Go to ...
  2. Do ...

Context

Notes from this Issue's Lead

Proposal & constraints

Related files

Stakeholders


Instructions for Contributors

tfmorris commented 3 months ago

The dump program was recently changed, so this could be related. https://github.com/internetarchive/openlibrary/pull/9127

mekarpeles commented 3 months ago

While running through our diagnosing cron failures guide (https://github.com/internetarchive/olsystem/wiki/Crons#diagnosing-cron-failures) we discovered:

DEBUG    : stats.py    :  46 :  Postgres Database : coverstore
Exception ignored in atexit callback: <function AtexitIntegration.setup_once.<locals>._shutdown at 0x7fa304df2660>
Traceback (most recent call last):
  File "/home/openlibrary/.local/lib/python3.12/site-packages/sentry_sdk/integrations/atexit.py", line 61, in _shutdown
    client.close(callback=integration.callback)
  File "/home/openlibrary/.local/lib/python3.12/site-packages/sentry_sdk/client.py", line 580, in close
    self.flush(timeout=timeout, callback=callback)
  File "/home/openlibrary/.local/lib/python3.12/site-packages/sentry_sdk/client.py", line 604, in flush
    self.transport.flush(timeout=timeout, callback=callback)
  File "/home/openlibrary/.local/lib/python3.12/site-packages/sentry_sdk/transport.py", line 525, in flush
    self._worker.submit(lambda: self._flush_client_reports(force=True))
  File "/home/openlibrary/.local/lib/python3.12/site-packages/sentry_sdk/worker.py", line 117, in submit
    self._ensure_thread()
  File "/home/openlibrary/.local/lib/python3.12/site-packages/sentry_sdk/worker.py", line 42, in _ensure_thread
    self.start()
  File "/home/openlibrary/.local/lib/python3.12/site-packages/sentry_sdk/worker.py", line 70, in start
    self._thread.start()
  File "/home/openlibrary/.local/lib/python3.12/site-packages/sentry_sdk/integrations/threading.py", line 56, in sentry_start
    return old_start(self, *a, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/threading.py", line 992, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't create new thread at interpreter shutdown
cdrini commented 2 months ago

This was also added which might cause the error: https://github.com/internetarchive/openlibrary/pull/9369

mekarpeles commented 2 months ago

@mekarpeles to manually run data dumps script according to https://github.com/internetarchive/olsystem/wiki/Crons#monthly-data-dumps

mekarpeles commented 2 months ago

The problem was a missing fi https://github.com/internetarchive/openlibrary/blob/master/scripts/oldump.sh#L125-L126