Open MerlijnWajer opened 4 years ago
Which search URL should be used to find epub files in the Internet Archive?
Probably either using openlibrary.org, or the advanced search feature (I will need to change the query, but here's an example):
I think this would work:
The same query in the web interface:
If you want to check what kind of metadata a book has: use archive.org/metadata/<id>/metadata
: https://archive.org/metadata/merchantable00unit/metadata
And if you want to include that in the searches, add &fl[]=<metadata key>
to the URL.
Downloading the epub:
https://archive.org/download/merchantable00unit/merchantable00unit.epub
archive.org/download/<id>/<id>.epub
You might want to also add AND NOT noindex:*
to remove some of the more bogus results.
One more correction, I think AND NOT format:(ACS Encrypted PDF)
is required as well. That should make sure we only get free epubs.
So likely, this is the right query: https://archive.org/search.php?query=NOT%20format%3A%28ACS%20Encrypted%20EPUB%29%20AND%20NOT%20format%3A%28ACS%20Encrypted%20PDF%29%20AND%20scanningcenter%3A%2A%20AND%20mediatype%3Atexts%20AND%20NOT%20noindex%3A%2A%20sherlock%20holmes
NOT format:(ACS Encrypted EPUB) AND NOT format:(ACS Encrypted PDF) AND scanningcenter:* AND mediatype:texts AND NOT noindex:* sherlock holmes
This seems like a good test item, renders fine in Dorian too: https://archive.org/details/masterpiecesofsh02doyl
One more thing, other formats than JSON can also be returned: https://archive.org/advancedsearch.php
In case that might be simpler.
And you can also filter by AND title:titlehere
or AND creator:authorhere
if you want to match Gutenberg
Would be a nice feature I think.