Improve handling of encoding errors in CSS

purnimagupta / threepress

Automatically exported from code.google.com/p/threepress

Other

0 stars 0 forks source link

Improve handling of encoding errors in CSS #149

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago

The following epub encodes its CSS in Latin 1 such that it can't be
automatically re-encoded in UTF-8. Bookworm doesn't need to handle this but
it should provide a better error, especially since epubcheck passes it.

Original issue reported on code.google.com by liza31337@gmail.com on 1 Jun 2009 at 9:06

Attachments:

taz_2009_05_23-2.epub

GoogleCodeExporter commented 8 years ago

The OPF/OPS specs seem ambiguous on what encoding included CSS files MUST be 
in, I
don't think it's unreasonable that EPUB creators use UTF8 for everything. If 
you'd
like to try to do something with Latin1, you may be interested in
http://chardet.feedparser.org/.

>>> import chardet
>>> crazy_css = open('taz_ebook.css')
>>> chardet.detect(crazy_css.read())
{'confidence': 0.88641879382271405, 'encoding': 'ISO-8859-2'}

Original comment by abdela...@gmail.com on 2 Jun 2009 at 2:30

GoogleCodeExporter commented 8 years ago

A similar problem happens if the NCX isn't in UTF-8 (Bookworm barfs 
gracelessly).

Original comment by liza31337@gmail.com on 25 Jun 2009 at 1:40