maemo-leste-extras / dorian

EBook Reader - Read books published in DRM-free EPUB format
Other
3 stars 1 forks source link

Support downloading epubs from archive.org #1

Open MerlijnWajer opened 4 years ago

MerlijnWajer commented 4 years ago

Would be a nice feature I think.

petterreinholdtsen commented 4 years ago

Which search URL should be used to find epub files in the Internet Archive?

MerlijnWajer commented 4 years ago

Probably either using openlibrary.org, or the advanced search feature (I will need to change the query, but here's an example):

https://archive.org/advancedsearch.php?q=collection%3Ainternetarchivebooks&fl%5B%5D=identifier&sort%5B%5D=&sort%5B%5D=&sort%5B%5D=&rows=50&page=1&output=json&callback=callback&save=yes

MerlijnWajer commented 4 years ago

I think this would work:

https://archive.org/advancedsearch.php?q=NOT%20format%3A(ACS%20Encrypted%20EPUB)%20AND%20scanningcenter%3A*%20AND%20mediatype%3Atexts%20AND%20NOT%20noindex:*%20AND%20sherlock%20holmes&fl[]=identifier&fl[]=title&fl[]=creator&sort[]=&sort[]=&sort[]=&rows=50&page=1&output=json&callback=callback&save=yes

The same query in the web interface:

https://archive.org/search.php?query=NOT%20format%3A%28ACS%20Encrypted%20EPUB%29%20AND%20scanningcenter%3A%2A%20AND%20mediatype%3Atexts%20sherlock%20holmes

If you want to check what kind of metadata a book has: use archive.org/metadata/<id>/metadata: https://archive.org/metadata/merchantable00unit/metadata

And if you want to include that in the searches, add &fl[]=<metadata key> to the URL.

Downloading the epub:

https://archive.org/download/merchantable00unit/merchantable00unit.epub

archive.org/download/<id>/<id>.epub

MerlijnWajer commented 4 years ago

You might want to also add AND NOT noindex:* to remove some of the more bogus results.

MerlijnWajer commented 4 years ago

One more correction, I think AND NOT format:(ACS Encrypted PDF) is required as well. That should make sure we only get free epubs.

MerlijnWajer commented 4 years ago

So likely, this is the right query: https://archive.org/search.php?query=NOT%20format%3A%28ACS%20Encrypted%20EPUB%29%20AND%20NOT%20format%3A%28ACS%20Encrypted%20PDF%29%20AND%20scanningcenter%3A%2A%20AND%20mediatype%3Atexts%20AND%20NOT%20noindex%3A%2A%20sherlock%20holmes

NOT format:(ACS Encrypted EPUB) AND NOT format:(ACS Encrypted PDF) AND scanningcenter:* AND mediatype:texts AND NOT noindex:* sherlock holmes

MerlijnWajer commented 4 years ago

This seems like a good test item, renders fine in Dorian too: https://archive.org/details/masterpiecesofsh02doyl

MerlijnWajer commented 4 years ago

One more thing, other formats than JSON can also be returned: https://archive.org/advancedsearch.php

In case that might be simpler.

And you can also filter by AND title:titlehere or AND creator:authorhere if you want to match Gutenberg