hplgit / doconce

Lightweight markup language - document once, include anywhere
http://hplgit.github.io/doconce/doc/web/index.html
Other
311 stars 60 forks source link

encoding hell #64

Closed ischurov closed 8 years ago

ischurov commented 8 years ago

It seems that doconce needs some refactoring to get out from the encoding hell we have now. Current status (as far as I can see after using doconce for a week) is as follows:

I believe that we need some general approach to this stuff to avoid copy-pasting of the code. The following issues should be addressed:

  1. wrapper function to open that respects encoding parameter transparently should exists.
  2. there should be one handler of 'ascii' codec can't decode… exception that suggests adding '--encoding=utf-8' option at the command line.

The other ways we can consider are:

I'd like to volunteer on fixing this (as time permits) as we choose a strategy.

hplgit commented 8 years ago

I totally agree with the description - doconce emerged from pure ascii (pure English texts) and utf-8 support has been just a series of ugly hacks.

I would vote for opening all files as utf-8 as I think that is the best long-term solution. There is significant interest in using doconce for non-English texts, so we should also think about a dictionary for generated expressions such as Table of Contents, Figure, Movie, etc.

We have to keep --encoding=... for backward compatibility, but we can issue a warning that it is no longer necessary.

Regarding Python 3, a common futurized code is on the todo list, but a previous attempt (see issue #38) failed due to future's handling of strings for Python 2.

Any attempt from ischurov to fix the encoding hell is much appreciated!