A few months ago I made a start at extracting content from EPUB files. See the 'extract-epub.js' file in the root of the project.
EPUB files are basically zip files of HTML files, so it's fairly straightforward to do this.
Now that the web app is a bit more mature, and can handle multiple feeds and tweaking text content of articles, I think it's a good time to introduce extracting from EPUBs.
Things to think about:
the 'chapters', which are the HTML files in the epub file, can be inconsistent. E.g. the title page of a 'part', or a dedication at the start of a book, will be a 'chapter'. So we need a way to merge and split chapters before creating articles from them.
I like the idea of just adding a couple of chapters as articles, so you're not doing a whole ebook in one go. That way, it keeps the cost down if you're not sure you're going to get through a book.
We could look at periodically adding chapters to the feed - e.g. one a day. That way you're turning a book into a series. You could even have chapters of multiple books in the same feed, so you can listen to multiple books in a queue etc.
I should do this in a way that will make sense for adding other document formats - eg. PDF and Word docs. So not making all of this stuff epub-specific.
A few months ago I made a start at extracting content from EPUB files. See the 'extract-epub.js' file in the root of the project.
EPUB files are basically zip files of HTML files, so it's fairly straightforward to do this.
Now that the web app is a bit more mature, and can handle multiple feeds and tweaking text content of articles, I think it's a good time to introduce extracting from EPUBs.
Things to think about: