Super and Subscript encoding gets removed but content not displayed accordingly

phfaist / pylatexenc

Simple LaTeX parser providing latex-to-unicode and unicode-to-latex conversion

https://pylatexenc.readthedocs.io

MIT License

301 stars 37 forks source link

Super and Subscript encoding gets removed but content not displayed accordingly #69

Closed Loligplayer33 closed 2 years ago

Loligplayer33 commented 2 years ago

I do have read through this issue: https://github.com/phfaist/pylatexenc/issues/36 which states that you don't support super- and subscripts out of understandable reasons.

However it would be nice to have the possibility to keep the encoding complete.

However using LatexNodes2Text().latex_to_text() a string like:

"This is H$_{2}$O"

Gets converted to:

"This is H_2O"

Which is less safe to match and replace using f.e. Regex

phfaist commented 2 years ago

Hey, thanks for the issue. I'm not sure I understand what you mean by "keep the encoding complete". What are you trying to achieve? (The LaTeX code you show doesn't look valid, is that a second subscript with a missing argument or do you have a typo? What would be the output you'd have expected?)

Loligplayer33 commented 2 years ago

Thank you for your comment. I forgot to escape the underscore therefore it did not display.... I will try to specify what I am trying to achieve: I have a bibtex string which I try to encode. The Problem is, that the latex_to_text() function does not seem to recognize the super- and subscript encoding correctly. f.e. When I give it the string "This is H$_{2}$O" it returns "This is H2O" instead of either: "This is H₂O" or "This is H${2}O". More specifically I need to keep the encoding which is not possible at the moment. I thought it might be best to insert some special other character that prevents latex_to_text() from detecting the encoding. this is the specification of bibtex for special characters. (I think you are from germany hence you should be able to understand it)

Vielen Dank für deine Hilfe!

phfaist commented 2 years ago

It sounds like the math_mode='verbatim' option of LatexNodes2Text provides the behavior you're after?

print( latex2text.LatexNodes2Text(math_mode='verbatim').latex_to_text(r"Drink lots of $H_{2}O$.") )
# prints:  Drink lots of $H_{2}O$.

Loligplayer33 commented 2 years ago

Works like a charm thank you. Next time I will look further by myself before bothering someone else with my problem.

phfaist commented 2 years ago

Great! No worries.