phfaist / pylatexenc

Simple LaTeX parser providing latex-to-unicode and unicode-to-latex conversion
https://pylatexenc.readthedocs.io
MIT License
283 stars 35 forks source link

The function of "latex_to_text" can not convert \sqrt[n]{x} with hold the sqrt num n #70

Closed halfbottles closed 2 years ago

halfbottles commented 2 years ago

As my title say.

from pylatexenc.latex2text import LatexNodes2Text latex = r""" ... (\sqrt[25]{10-2.56})-1=8.36% ... """ latex = latex.replace('%', '/100') text = LatexNodes2Text().latex_to_text(latex) print(text)

√(10-2.56)-1=8.36/100

The function of "latex_to_text" did ignore the num 25 in "sqrt[25]". It thransfer "\sqrt[25]" to "√" which is wrong. I can understand this issue is caused by that utf code did not acquire other sqrt code than "√". So how to thransfer a latex code like "\sqrt[n]{x}" to "x**(1/n)". Maybe "Define replacement texts" can solve this question,but I want u know this issue. Your proj is great, wash it and ur life better and better, THX.

phfaist commented 2 years ago

As you say, the reason that the n-th root symbol is not supported by default is that I didn't find a good unicode equivalent.

I don't think using (x)**(1/n) by default would be a good idea, as it would be making an assumption about the semantics of the expression. What I mean is the following. What if the user had defined the notation achieved with \sqrt[n]{x} to mean something completely different? Or what if they had a special convention where \sqrt[0]{x} meant something, but (x)**(1/0) wouldn't? Then, for general expressions, we'd probably have to introduce parentheses to make sure that expressions of the type \sqrt[1+a]{x+y} don't get replaced by x+y**(1/1+a)

The converter latex2text aims to produce a unicode representation that is in some sense the "least surprising" or "most canonical equivalent" to its input, regardless of semantics. Sometimes that's not possible because of the limitations of the unicode representation. It sounds like you want a converter that preserves the mathematical semantics of your equations. You can achieve that too with latex2text, but you need to redefine the replacement texts. You can get more information about how to achieve that here. Here's an example to get started:

l2t_context_db = latex2text.get_default_latex_context_db()
def _replace_sqrt(n, l2tobj):
    if n.nodeargd is None or not n.nodeargd.argnlist:
         # edge cases where no arguments were parsed
        return '\N{SQUARE ROOT}'
    argnlist = n.nodeargd.argnlist
    if argnlist[0] is not None:
        # optional argument preset -> n-th root
        return ( '(' + l2tobj.nodelist_to_text([argnlist[1]]) + ')**(1/('
                 +  l2tobj.nodelist_to_text([argnlist[0]]) + '))' )
    return '\N{SQUARE ROOT}(' + l2tobj.nodelist_to_text([argnlist[1]]) + ')'
l2t_context_db.add_context_category(
    'math-macros-semantic-replacements',
    prepend=True,
    macros=[
        latex2text.MacroTextSpec("sqrt", simplify_repl=_replace_sqrt),
    ]
)

print(
  latex2text.LatexNodes2Text(latex_context=l2t_context_db).latex_to_text(
     r'We have $\sqrt[4]{16} = \sqrt[2]{4} = \sqrt{4} = 2$.'
  )
)
# prints:
#
# We have (16)**(1/(4)) = (4)**(1/(2)) = √(4) = 2.

I agree the current behavior, consisting in ignoring the n-th root argument completely, is counterintuitive. I haven't found any better solution so far.

I'm closing the issue for now, feel free to reopen if you have additional suggestions!

halfbottles commented 2 years ago

thx for ur comment in my question.well,i get and accpet ur point.