internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.08k stars 1.32k forks source link

Followup to #7478: Move Cover Tars -> Zips #9560

Closed mekarpeles closed 3 weeks ago

mekarpeles commented 2 months ago

Problem

A clear and concise description of what you want to happen

Open Library relies on Archive.org as a storage layer for millions of book covers and has a resolver at covers.openlibrary.org which looks in the Open Library cover DB for a cover and determines where it lives (either on OL disk or in an archive.org item).

Right now w're seeing what appears to be high volume, programatic access of our covers... Instead of downloading the entire tar, they are downloading a cover at a time rapidly. This is causing performance issues on Archive.org nodes because accessing tars is much slower than zip.

We should block this high volume access first of all as a stop gap, but either way we should prioritize moving these tars to zips.

Prerequisites

Phase I

The strategy is:

Phase II

There is one exceptional case, which is that there is a batch of tars on disk that we'd like to be zips. One solution (while not ideal) is to perform this same process, to:

Proposal & Constraints

What is the proposed solution / implementation?

Is there a precedent of this approach succeeding elsewhere?

Which suggestions or requirements should be considered for how feature needs to appear or be implemented?

Leads

Related files

Stakeholders


Instructions for Contributors

hbromley commented 2 months ago

See jira issue PBOX-3879 for creation of the fixer op that would make the zips.

cdrini commented 1 month ago

Marking as Blocked since waiting for PBOX-3879.

hbromley commented 1 month ago

Jira issue PBOX-3879 is now closed, and the new fixer op is currently being deployed to the PRI servers where it would run.

Let me know if you need any help running it on all your items that contain tars.