siefkenj / unified-latex

Utilities for parsing and manipulating LaTeX ASTs with the Unified.js framework
MIT License
85 stars 20 forks source link

`$` parsing in verbatim #35

Closed James-Yu closed 1 year ago

James-Yu commented 1 year ago

This issue originates from https://github.com/James-Yu/LaTeX-Workshop/issues/3922 .

When unified-latex parses $ in verbatim environments, the parsing time exponentially grows.

The root cause is two-fold:

  1. There is no support to lstlisting package in unified-latex-ctan. I am happy to work on it but
  2. Currently unified-latex won't glob arguments if they are defined as v or +v (verbatim as in xparse).

I'm not sure how to work on the second point. Any idea?

siefkenj commented 1 year ago

Exponential parsing may be tricky to solve...For 2, look here:

https://github.com/siefkenj/unified-latex/blob/98831af88bdb8136437fa2bb2a0daf152740df16/packages/unified-latex-util-arguments/libs/gobble-single-argument.ts#L96

It appears that `"verbatim" as specified https://github.com/siefkenj/unified-latex/blob/98831af88bdb8136437fa2bb2a0daf152740df16/packages/unified-latex-util-argspec/libs/argspec-types.ts#L24 is not dealt with in the code. That shouldn't be too hard to add, though getting a reference to the original source to make sure it wasn't modified at all might be annoying...

As for true verbatim environments, they are handled at the PEGjs grammar level. See: https://github.com/siefkenj/unified-latex/blob/98831af88bdb8136437fa2bb2a0daf152740df16/packages/unified-latex-util-pegjs/grammars/latex.pegjs#L108 where both the \verb and the verbatim environments are listed. I think new verb/verbatim stuff must be defined in the grammar rather than as a package if one needs to avoid parsing the contents of the verb/verbatim entirely.