Open hmltn-0 opened 2 years ago
Hello, This issue was sent to the Publishing at W3C Community Group. I would think that all you need to do is to join that community group and we could then help you in your efforts.
Best George
Perhaps you could tell us more about what you want to do? EPUB consists largely of HTML, so rendering an EPUB would involve presenting HTML to the end user.
Thanks for your message.
I want to write my own Python script which extracts the text from an EPUB.
To do this I need to understand how the files are arranged. Are they in order in the directory?
Is it as simple as getting all “p” and “h” tags or are there certain kinds of tags that contain text and certain ones that do not?
Thank you
To do this I need to understand how the files are arranged. Are they in order in the directory?
Not necessarily. The components of the EPUB and their order are described in the XML package file. A basic introduction to EPUB can be found in the overview.
The spec defining the file format is here
Is it as simple as getting all “p” and “h” tags or are there certain kinds of tags that contain text and certain ones that do not?
I would highly recommend using an HTML parsing library in Python. I have personal experience with Beautiful Soup.
Good luck!
I’d like to write a script to render EPUB with Python.
Is there an official EPUB community mail list where I can learn more about how EPUBs are rendered?
Thank you very much