internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.18k stars 1.35k forks source link

Edition download options have broken links b/c file not yet generated #9638

Open scottbarnes opened 3 months ago

scottbarnes commented 3 months ago

Problem

For Internet Archive items where there are valid download options, e.g. HOCR, there are sometimes not DAISY or EPUB files. They can be generated on the fly, but they have not been pre-derived. When specific download format (e.g. EPUB) has not been pre-derived (but is otherwise derivable or able to be made on the fly), the Edition page shows broken links.

See, e.g.: https://archive.org/details/sierraclubbullet1196sier/

However, the corresponding Edition has broken links for ePub and DAISY, under Download Options, because neither file format has been created in the Internet Archive item: https://openlibrary.org/books/OL25496907M/Sierra_Club_bulletin.

Evidence / Screenshot

Relevant URL(s)

Reproducing the bug

  1. Go to https://openlibrary.org/books/OL25496907M/Sierra_Club_bulletin
  2. Click on ePub or DAISY under the download options.

Context

Notes from this Issue's Lead

Proposal & constraints

Related files

Stakeholders


Instructions for Contributors

This will need an associated issue on the Internet Archive side most likely, somewhere.

deysandip301 commented 3 months ago

Hey @scottbarnes I would like to work on this issue... please assign this issue to me...:)

scottbarnes commented 3 months ago

@deysandip301, I would be happy to assign this to you, but I don't want to set you up for a frustrating time.

I'm personally not sure of how to go about solving this solely from the Open Library side because the issue here is that sometimes the epub and DAISY files are there and the links work, and other times there are not, and the mechanism to create them, currently, is via some buttons on the Internet Archive item page.

I suppose one strategy might be to use fetch to pre-check the URL for downloading the epub and DAISY files, and then re-direct to the Internet Archive item page if there is a status code of 404, but I got CORS errors with both a HEAD and GET request, so that might require a server-side check, and that's supposing this is the solution we want.

Perhaps what would be most helpful is to propose a solution or some solutions and we'll see after some discussion we (where 'we' is vaguely defined and I am not yet sure who we'd need to include in that decision) can come up with an acceptable plan of action.

Edit: I added the Needs: Detail tag for now, even though that's probably not the correct one, but for now I'll leave it at that as nothing seems directly on point.

cdrini commented 2 months ago

Options are (for staff to decide):

Jash2606 commented 2 months ago

Hey @scottbarnes ,I would like to work on this issue. please could you assign me this issue?

scottbarnes commented 2 months ago

Alas, though I admire your enthusiasm, @Jash2606, this one is stuck pending a staff decision, so this one would be hard to work on, as even we're not sure what we want to do at the moment. Sorry! :)