starryangt / thoth

Filters, downloads, and creates an Epub for web-based content.
MIT License
15 stars 1 forks source link

ePub issues on Shin Translations Site (Wordpress) #1

Closed Trippley closed 8 years ago

Trippley commented 8 years ago

Just tried with the new tool on Windows 8.1 to create an epub of specific chapters of The New Gate on the Shin Translations Website. I pulled the following on the thoth.exe:

https://shintranslations.com/vol-2-chapter-1-part-1/
https://shintranslations.com/vol-2-chapter-1-part-2/
https://shintranslations.com/vol-2-chapter-1-part-3/

The file is opening fine with Sumatra PDF but Play Books as well as the ePub Validator have issues handling that file. The Validator throws mostly

Error while parsing file 'value of attribute "id" is invalid; must be an XML name without colons'.

and error messages with similiar missing attributes.

 Error while parsing file 'element "script" missing required attribute "type"'.

Play books just can't process the file.

starryangt commented 8 years ago

The e-readers I use are generally pretty lax with XHTML errors so I haven't paid it much mind, but yes, the EPUB generation is probably the part I'm least happy with. Essentially, the issue is that I'm just placing the site's HTML directly into an XHTML template I made, and obviously the HTML probably isn't going to be up to spec with XHTML.

I plan on working on a proper solution later, but for now, I added strict mode, which you can activate with -s. This should eliminate any XHTML spec errors, though it has the unfortunate side-effect of killing formatting right now.

I tested the three links you specified and strict mode seems to appease Play Books at the very least.

(as a side note, Play Books takes ages to process the smallest of epubs)