siefkenj / unified-latex

Utilities for parsing and manipulating LaTeX ASTs with the Unified.js framework
MIT License
87 stars 20 forks source link

Parsing takes absurdly long time for some simple documents #115

Open squ1dd13 opened 1 week ago

squ1dd13 commented 1 week ago

Hi,

I opened #4429 over at LaTeX Workshop regarding significant performance issues with AST parsing. From my experiments it looks like \newenvironment in particular is able to cause these problems. I was told to come here because unified-latex is used by LaTeX Workshop for parsing. The details are all in the linked issue, including a simple example. As I wrote in that issue, I would be happy to privately share a short document which takes upwards of a minute to parse if this would be useful.

Thanks :)

siefkenj commented 5 days ago

I don't think this is caused by the parser itself, but possibly some further processing steps?

If you put your sample document from #4429 into https://siefkenj.github.io/latex-parser-playground/ it parses almost instantly.

squ1dd13 commented 5 days ago

I've just tried that and indeed it is practically instant, but then I don't know what's going on because the long times that LaTeX Workshop reports seem to be calculated across a call to unified-latex: unifiedParser.parse(...) via parseLaTeX(...) via parse.tex(...). I don't know TS very well but I don't immediately see much in there that could cause such high CPU usage other than the parser, unless there's somehow something weirdly specific to my machine going on here.

@James-Yu are you able to reproduce the issue in LaTeX Workshop? If not, I'll just close this issue since it must be on my end, but if you can reproduce it, I guess we go back to the original issue and close this one? Unless you still think the bug is with unified-latex?

Sorry to discuss LaTeX Workshop specifics on an issue for unified-latex, but at this point I don't know which project the bug is in. Thanks to both of you for your help so far.

James-Yu commented 4 days ago

I can reproduce the problem with LaTeX Workshop, and the playground parsed the content very fast. However, it's quite unlikely an issue related to LW.

I do think we can continue the investigation back in the other issue and close this one for now, until the root cause is identified.

squ1dd13 commented 12 hours ago

(For anyone else: We're back from LW #4429 again.)

I've tried this on various different builds and the problem seems to have been introduced between v1.6.1 and v1.7.0. On v1.6.1, this script shows

parsing took 19 ms

whereas on v1.7.0, I get

parsing took 8051 ms
squ1dd13 commented 12 hours ago

On further investigation, it seems to have started in 05ed77ee5e1f7f48351af121ca34d94c0926b7cc "Switch to Vite for building".

To double check, I made two fresh clones, checked out c8213f85b58882204a8a3baac047c4791af0d618 on one and 05ed77ee5e1f7f48351af121ca34d94c0926b7cc on the other, then on each:

mkdir examples
cp /path/to/parser-choke.ts examples/
npm install
npm run build -ws
npx vite-node examples/parser-choke.ts

On the c8213f85b58882204a8a3baac047c4791af0d618 clone, I get

parsing took 17 ms

and on the 05ed77ee5e1f7f48351af121ca34d94c0926b7cc clone,

parsing took 8008 ms