Closed francispeixoto closed 2 months ago
Do these links work for you now, @francispeixoto? We may have an issue whereby the links are broken while the dumps are generating.
Do these links work for you now, @francispeixoto? We may have an issue whereby the links are broken while the dumps are generating.
Yep both are working now. Thanks!
looks like it still fails from console tho:
$ wget https://openlibrary.org/data/ol_dump_latest.txt.gz
--2024-09-02 16:39:02-- https://openlibrary.org/data/ol_dump_latest.txt.gz
Resolving openlibrary.org (openlibrary.org)... 207.241.234.205
Connecting to openlibrary.org (openlibrary.org)|207.241.234.205|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://archive.org/download/ol_dump_2024-08-31/ol_dump_2024-08-31.txt.gz [following]
--2024-09-02 16:39:02-- https://archive.org/download/ol_dump_2024-08-31/ol_dump_2024-08-31.txt.gz
Resolving archive.org (archive.org)... 207.241.224.2
Connecting to archive.org (archive.org)|207.241.224.2|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://ia600800.us.archive.org/13/items/ol_dump_2024-08-31/ol_dump_2024-08-31.txt.gz [following]
--2024-09-02 16:39:03-- https://ia600800.us.archive.org/13/items/ol_dump_2024-08-31/ol_dump_2024-08-31.txt.gz
Resolving ia600800.us.archive.org (ia600800.us.archive.org)... 0.0.0.0, ::
Connecting to ia600800.us.archive.org (ia600800.us.archive.org)|0.0.0.0|:443... failed: Connection refused.
Connecting to ia600800.us.archive.org (ia600800.us.archive.org)|::|:443... failed: Connection refused.
Hmm, the plot thickens. I can use wget
to fetch both from https://openlibrary.org/data/ol_dump_latest.txt.gz and https://ia600800.us.archive.org/13/items/ol_dump_2024-08-31/ol_dump_2024-08-31.txt.gz, at least at the minute.
Hmm, the plot thickens. I can use
wget
to fetch both from https://openlibrary.org/data/ol_dump_latest.txt.gz and https://ia600800.us.archive.org/13/items/ol_dump_2024-08-31/ol_dump_2024-08-31.txt.gz, at least at the minute.
Welp I tried again on a 5g tether to bypass my firewall and it worked. Looks like I've got an investigation on my hands. Sorry for the worry!
Looks like pihole doesn't like archive.org out of the box. I had to explicitly whitelist it and now my scripts fetch the file properly
When a new data dump at the beginning of the month is being generated and uploaded to archive.org (which is where the download occurs from) there is a period of time where the item containing the files exists but the content is not ready yet. Therefore, the link resolves to this "in-progress" item that seems not to work. We can probably add some logic so the previous month's dump is used until the latest one is ready, but for now (for anyone who hits this in the future) a workaround is searching for all historical dumps on archive.org for open library and using the latest working one until the in-progress latest is ready.
Related links for anyone who may want to explore another work-around: https://github.com/internetarchive/openlibrary/blob/91bca06dbd23b080b827ca7a273af1eecde48353/openlibrary/plugins/upstream/data.py#L76-L85 https://github.com/internetarchive/openlibrary/blob/91bca06dbd23b080b827ca7a273af1eecde48353/openlibrary/plugins/upstream/data.py#L15-L22
@francispeixoto if you're interested in spending a few moments looking into a solution, we'd appreciate it! Though we're marking as questions resolved (i.e. problem identified, workaround offered)
Problem
The All Types and Ratings dump file links are currently 404.
All Types: https://openlibrary.org/data/ol_dump_latest.txt.gz Ratings: https://openlibrary.org/data/ol_dump_ratings_latest.txt.gz
Reproducing the bug
Expected behavior: The relevant download starts
Actual behavior: The Internet Archive 404 page appears
All Types: https://ia601601.us.archive.org/27/items/ol_dump_2024-08-31/ol_dump_2024-08-31.txt.gz Ratings: https://ia601601.us.archive.org/27/items/ol_dump_2024-08-31/ol_dump_ratings_2024-08-31.txt.gz
Context
Breakdown
Requirements Checklist
Related files
*
Stakeholders
*
Instructions for Contributors