phfaist / pylatexenc

Simple LaTeX parser providing latex-to-unicode and unicode-to-latex conversion
https://pylatexenc.readthedocs.io
MIT License
301 stars 37 forks source link

Parsing commands containing "@" #74

Closed ckorzen closed 2 years ago

ckorzen commented 2 years ago

Hey,

I need to parse TeX code of the following form:

\makeatletter
\renewcommand{\@xyz}{Some text}
\makeatother

pylatexenc splits the \@xyz part into \@ and xyz:

LatexMacroNode([...], macroname='@'')
LatexCharsNode([...], chars='xyz')

Is there an option to consider the @ as a valid part of a macro name, so that \@xyz won't be split?

phfaist commented 2 years ago

Great suggestion! This feature is planned for an upcoming pylatexenc version 3.0.

ckorzen commented 2 years ago

I would like to quick fix it for myself. I had a look into the code and found line 502-514 in pylatexenc/latexwalker/_walker.py:

 if s[pos] == '\\':
      # escape sequence
      if pos+1 >= len(s):
          raise LatexWalkerEndOfStream()
      macro = s[pos+1] # next char is necessarily part of macro
      # following chars part of macro only if all are alphabetical
      isalphamacro = False
      i = 2
      if s[pos+1].isalpha():
          isalphamacro = True
          while pos+i<len(s) and s[pos+i].isalpha():
              macro += s[pos+i]
              i += 1

is it enough to change line 512 to something like while pos+i<len(s) and (s[pos+i].isalpha() or s[pos+i] == "@")? Yes, that would be a quick fix, but ok for me.

phfaist commented 2 years ago

For a quick patch, I would replace both instances of s[pos+1].isalpha() (in the middle if as well as in the last while) by a call to an external function is_long_macro_name_char(s[pos+1]) and then you can define the function def is_long_macro_name_char(c): return c.isalpha() or c == "@". I didn't test this but I think it should work.

ckorzen commented 2 years ago

Works! Thanks a lot.