igordejanovic / parglare

A pure Python LR/GLR parser - http://www.igordejanovic.net/parglare/
MIT License
136 stars 32 forks source link

Possible memory leak #129

Closed johnw3d closed 2 years ago

johnw3d commented 3 years ago

Description

The GLR parser seems to have a memory leak on repeated uses.

What I Did

I'm using the GLR parser with a Korean grammar to do phrase-structure analysis on a large training corpus of Korean text sentences, so calling GLRParser.parse() hundreds of thousands of times on sentences averaging 50-100 characters. Memory use climbs relatively quickly, roughly 10's or 100's of MB per 100 calls.

It may well be some form of impure structure, since this happens whether I reload the grammar & re-instance the GLRParser on every sentence, or if I do that once and call parse() on the same instance repeatedly.

Also, the parser does not seem to be re-entrant, perhaps also pointing at some impure structures somewhere; I tried to run the thing in a thread-pool setup and it failed to work with odd errors (which I can detail if wanted), but it does work using process-pools.

I'll work up a reproducing rig, if that's needed.

Thanks, John.

igordejanovic commented 3 years ago

Hi, John. Thanks for reporting. It would be great if you could make a full minimal example that exhibits the behavior.

johnw3d commented 3 years ago

Hi Igor. I've been quite busy lately, and having a little trouble getting a minimal example without all my additional rigging and large grammars and test source sets. Will get you something as I can, and check (for the 4th time!) it is not my rigging that has the leak. I'll report back here again soon.

igordejanovic commented 3 years ago

Hi, John. Do you still experience these problems with the new version? I've just released 0.15.0 and a lot has changed since 0.12 so you might give it a try.

igordejanovic commented 2 years ago

@johnw3d Hi. I'm closing this as stale and non-reproducible. In case problems still persist with the newest version feel free to reopen.