Open mitra42 opened 6 years ago
Some previous notes on this: See Richard Caceres recent Slack chat of new version ~/git/ia_bookreader : https://github.com/internetarchive/bookreader
Brewster mentioned you were using bookreader, and I should let you know there's a new version that's much easier to use outside of IA ---- OLD notes --- Richard sent some links in Skype - need to either a) use Mek's IIIF reader b) use the bookreader, get the jSON file (could decentralzie) and then inside JSIA is a way to get the page.
view source here: https://archive.org/stream/10_PRINT_121114#page/n0/mode/2up
bookreader initialization library: https://archive.org/bookreader/BookReaderJSIA.js?v=aHe9koCh
Notes from revisiting this? Open questions: [ ] How to view PDFs - and/or how to make the .jpg's
Research steps [ ] Look at https://github.com/internetarchive/bookreader BookReaderDemo/demo-simple.html and BookReaderJSSimple.js
What I've found ... Main json control file is : [https://openlibrary.org/query.json?type=/type/edition&*=&ocaid=zandvoort.newspapers.1992.zandvoorts.nieuwsblad&callback=jQuery110207786013323137531_1545886524531&_=1545886524532] which says its application/javascript but is actually application/json [ Question posed to Richard ] its not clear to me how to pass this to bookreader.
It contains urls like [https://ia802605.us.archive.org/BookReader/BookReaderImages.php?zip=/9/items/zandvoort.newspapers.1992.zandvoorts.nieuwsblad/1992.Zandvoorts.Nieuwsblad_jp2.zip&file=1992.Zandvoorts.Nieuwsblad_jp2/1992.Zandvoorts.Nieuwsblad_0000.jp2 ] for page0, its not clear to me if these are formulaic but probably doesnt' matter, but for dweb-mirror should be able to pull the zip, and then edit the URLs in the control file before passing to bookreader, for dweb-archive would also have to intercept where BookReader fetches these files.
THere is a strange URL [https://openlibrary.org/query.json?type=/type/edition&*=&ocaid=WillieLynchLetter1712&callback=jQuery11020018995238347655485_1545885427175&_=1545885427176] which says its application/json
but actually returns application/javascript
Options
I’m trying to figure out a strategy to do this in both the Dweb, or offline case, its tricky, in both cases.
For dweb.archive.org I think I have to ….
For dweb-mirror (offline) where there is a local server.
Done: ./crawl.js --level all zandvoort.newspapers.1992.zandvoorts.nieuwsblad but it missed the big files (>700Mb for the zip)
(Note to self - see EN/Dweb - Archive - Text)
An example of a text item with multiple "books" try https://archive.org/details/ialerequestsummary Books are one page
EDITED: Background info: Multipage books thetaleofpeterra14838gut or alicesadventures19033gut are reasonably small but are displaying as a slide carousel [https://archive.org/search.php?query=mediatype:texts%20AND%20imagecount:8] shows small ones and unitednov65unit is an example
[ ] Figure out what switches slide carousel or bookreader
From Jeff Kaplan: typically if an item is mediatype=texts
and there is an abby and pdf file then it will result in a bookreader presentation. loose images would not result in a pdf or bookreader presentation. and an item with abby and pdf that is mediatype=texts would have no bookreader presentation. it would need to be mediatype=texts.
See - #109 for failure case (Peter Rabbit) that should use slide carousel
The system currently supports text through Richards player, at some point it needs to work with that player to allow it to be decentralized.