purnimagupta / threepress

Automatically exported from code.google.com/p/threepress
Other
0 stars 0 forks source link

Improve handling of encoding errors in CSS #149

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
The following epub encodes its CSS in Latin 1 such that it can't be
automatically re-encoded in UTF-8. Bookworm doesn't need to handle this but
it should provide a better error, especially since epubcheck passes it.

Original issue reported on code.google.com by liza31337@gmail.com on 1 Jun 2009 at 9:06

Attachments:

GoogleCodeExporter commented 8 years ago
The OPF/OPS specs seem ambiguous on what encoding included CSS files MUST be 
in, I
don't think it's unreasonable that EPUB creators use UTF8 for everything. If 
you'd
like to try to do something with Latin1, you may be interested in
http://chardet.feedparser.org/.

>>> import chardet
>>> crazy_css = open('taz_ebook.css')
>>> chardet.detect(crazy_css.read())
{'confidence': 0.88641879382271405, 'encoding': 'ISO-8859-2'}

Original comment by abdela...@gmail.com on 2 Jun 2009 at 2:30

GoogleCodeExporter commented 8 years ago
A similar problem happens if the NCX isn't in UTF-8 (Bookworm barfs 
gracelessly).

Original comment by liza31337@gmail.com on 25 Jun 2009 at 1:40