phfaist / pylatexenc

Simple LaTeX parser providing latex-to-unicode and unicode-to-latex conversion
https://pylatexenc.readthedocs.io
MIT License
283 stars 35 forks source link

double quotes are not be converted back to double quotes #59

Closed najtin closed 3 years ago

najtin commented 3 years ago
from pylatexenc.latexencode import unicode_to_latex
from pylatexenc.latex2text import LatexNodes2Text
LatexNodes2Text().latex_to_text(unicode_to_latex("\""))=="\""

I think the error is in unicode_to_latex, because it converts " to ''. I don't really have an idea on how to fix this.

phfaist commented 3 years ago

Hi, thanks for the feedback and yes, this is a consequence of the fact that latex and unicode aren't one-to-one and there is necessarily a form of loss of information in the conversion. In LaTeX normally you use [``] and [''] to typeset opening and closing double-quotes. Even if we did interpret the ascii double-quote character ["] as a closing double quote, it could make more sense for unicode_to_latex() to replace it by [''] which is the LaTeX way of writing a closing double quote. If you send that through latex_to_text() again, you'd get the unicode equivalent for that, []=[U+201D Right Double Quotation Mark].

Is there a use case you have in mind where it is important to recover a double-quote after converting both ways from unicode to latex and back? In any case you can customize all of pylatexenc's replacements to best suit your needs, for this check out this page and this page.

I'm closing this issue, feel free to reopen if I'm missing anything.

najtin commented 3 years ago

Well, yes, i have a totally obscure use case. I have to encode json in a bibtex entry as metadata in a field of the bibtex. Please don't ask me why. I use your library for the latex conversion. So on the way back, during parising, i do something like json.loads(LatexNodes2Text().latex_to_text(string)), but this fails because of the double-quite conversion. The workaround now is to replace the "special"-double-quotes with the simple double-quotes (in the String itself). I understand that this a very, very niche use case and might break other use cases - so this is probably a "is a feature, not a bug"-thing.

Anyways, thank you for the very helpful links!

phfaist commented 3 years ago

Wow, thanks for the additional info. Well I'm not sure what the best solution for your use case is, but you can certainly override pylatexenc's default conversion rules such that '"' gets mapped either way to whatever you like (or does not get replaced).