kschiess / parslet

A small PEG based parser library. See the Hacking page in the Wiki as well.
kschiess.github.com/parslet
MIT License
805 stars 95 forks source link

Allow 'match' to match any regular expression and parse multiple characters at once #206

Closed ojundt closed 2 years ago

ojundt commented 4 years ago

Parslet is a great library (thanks for your work!) but it's also VERY memory hungry. While using parslet for parsing bank account statements in mt940 format I observed memory usage of up to 1,6 GB for an input file of less than 3 MB.

One of the problems is the overhead of having to use many atoms to describe something that could easily be combined with a single regular expression. I'm not sure why you've chosen to match only a single character at a time with match but it makes things unnecessarily complex.

This PR extends the match atom by accepting any regular expression and removing the single character limit.

When I simplified the mt940 parser with this, memory usage dropped by 66%.