siefkenj / unified-latex

Utilities for parsing and manipulating LaTeX ASTs with the Unified.js framework
MIT License
91 stars 24 forks source link

Parsing takes absurdly long time for some simple documents #115

Closed squ1dd13 closed 1 month ago

squ1dd13 commented 1 month ago

Hi,

I opened #4429 over at LaTeX Workshop regarding significant performance issues with AST parsing. From my experiments it looks like \newenvironment in particular is able to cause these problems. I was told to come here because unified-latex is used by LaTeX Workshop for parsing. The details are all in the linked issue, including a simple example. As I wrote in that issue, I would be happy to privately share a short document which takes upwards of a minute to parse if this would be useful.

Thanks :)

siefkenj commented 1 month ago

I don't think this is caused by the parser itself, but possibly some further processing steps?

If you put your sample document from #4429 into https://siefkenj.github.io/latex-parser-playground/ it parses almost instantly.

squ1dd13 commented 1 month ago

I've just tried that and indeed it is practically instant, but then I don't know what's going on because the long times that LaTeX Workshop reports seem to be calculated across a call to unified-latex: unifiedParser.parse(...) via parseLaTeX(...) via parse.tex(...). I don't know TS very well but I don't immediately see much in there that could cause such high CPU usage other than the parser, unless there's somehow something weirdly specific to my machine going on here.

@James-Yu are you able to reproduce the issue in LaTeX Workshop? If not, I'll just close this issue since it must be on my end, but if you can reproduce it, I guess we go back to the original issue and close this one? Unless you still think the bug is with unified-latex?

Sorry to discuss LaTeX Workshop specifics on an issue for unified-latex, but at this point I don't know which project the bug is in. Thanks to both of you for your help so far.

James-Yu commented 1 month ago

I can reproduce the problem with LaTeX Workshop, and the playground parsed the content very fast. However, it's quite unlikely an issue related to LW.

I do think we can continue the investigation back in the other issue and close this one for now, until the root cause is identified.

squ1dd13 commented 1 month ago

(For anyone else: We're back from LW #4429 again.)

I've tried this on various different builds and the problem seems to have been introduced between v1.6.1 and v1.7.0. On v1.6.1, this script shows

parsing took 19 ms

whereas on v1.7.0, I get

parsing took 8051 ms
squ1dd13 commented 1 month ago

On further investigation, it seems to have started in 05ed77ee5e1f7f48351af121ca34d94c0926b7cc "Switch to Vite for building".

To double check, I made two fresh clones, checked out c8213f85b58882204a8a3baac047c4791af0d618 on one and 05ed77ee5e1f7f48351af121ca34d94c0926b7cc on the other, then on each:

mkdir examples
cp /path/to/parser-choke.ts examples/
npm install
npm run build -ws
npx vite-node examples/parser-choke.ts

On the c8213f85b58882204a8a3baac047c4791af0d618 clone, I get

parsing took 17 ms

and on the 05ed77ee5e1f7f48351af121ca34d94c0926b7cc clone,

parsing took 8008 ms
siefkenj commented 1 month ago

Great detective work. But it's very confusing that the bundler caused this! Are you able to profile and see which function is actually causing the slowdown?

squ1dd13 commented 1 month ago

Here's the top of the list: image

Here's the .cpuprofile: CPU.20241017.200424.7326.0.001.cpuprofile

Earlier I tried building the latest unified-latex without Vite and it did seem to fix the issue completely. It's really weird.

Edit: When I say 'without Vite', I mean I did some pretty unspeakable things to the repo in order to revert the build system change. I'm quite out of my depth here. :laughing:

siefkenj commented 1 month ago

Very interesting! I wonder if it has something to do with the peggy compiler. In vite I wrote a new plugin to get peggy to work. I may have not copied over all the settings from the esbuild version...

squ1dd13 commented 1 month ago

Like this? I don't see the cache option here.

(I can't test right now, but I'll have a look later.)

siefkenj commented 1 month ago

Yes. That looks relevant!