ipfs-inactive / archives

[ARCHIVED] Repo to coordinate archival efforts with IPFS
https://awesome.ipfs.io/datasets
182 stars 24 forks source link

LibriVox, free public domain audiobooks #189

Open fiatjaf opened 5 years ago

fiatjaf commented 5 years ago

https://librivox.org/

I want to put this on IPFS, but I'm sure I'll not have the necessary disk space, so maybe we could do some work together here.

Or I'll do it in the distance future myself.

smwa commented 5 years ago

I'm looking to help support the adoption of IPFS, but I'm new to the world of archiving. I've always appreciated the work of librivox, and would be interested in helping.

I suppose the first step is to decide what to archive. I would expect all completed titles, ignoring in progress and open(requested?) titles. This would then get updated as new titles get completed. (https://librivox.org/rss/latest_releases if it would be done programmatically). There seem to be around 30,000 completed titles.

The second is the archive format. It seems that their url's are based on the titles, and should be unique even between different languages of the same book, such as https://librivox.org/the-analects-of-confucius/ and https://librivox.org/lun-yu-or-analects-of-confucius-read-in-chinese/. That would likely make a fine directory name.

Archive.org has many or all of these (https://archive.org/details/analects_confucius_1303_librivox) and their directories are very complete for at least some of the titles: https://archive.org/download/analects_confucius_1303_librivox. These include cover art(high and low res), metadata, .m4b version(all audio in 1 file that supports bookmarks), torrent file, and for each chapter: high and low res .mp3, .ogg, a .png of the spectrogram, and a compressed json of metrics of the audio.

librivox.org has a few pieces of information per title: title, description, cover art, link to author page, link to reader page per chapter, duration, date added, genres, language, list of people that helped and the part that they did, .m4b version, and a .mp3 per chapter(with chapter title, reader, and duration).

Edit: Link to api: https://librivox.org/api/feed/audiobooks?format=json&offset=0&limit=100 and docs https://librivox.org/api/info Unfortunately the api doesn't have the cover art url in it, but that url looks standardized.

Also an ipfs web app could play these in browser and keep track of what book you're listening to, if somebody is interested in a sub-project.