This makes a few improvements to make parsing of large projects fairly significantly faster. On my machine, on an example project (which is about 10 MB) it goes from taking 5.9 seconds to parse (fastest of 10 runs) to taking 1.8 seconds, a 3x speedup. Note that I calculated minimum for how fast we could reasonably parse a project of this size by looping over each character, checking if it is "a" and incrementing a counter. This baseline minimum took 0.9 seconds, so we're much closer to that now.
The basic strategies taken here were to speed up padding/whitespace/comment parsing by using a frozenset to check whether a character was one of the ones we care about (which is significantly faster than chaining or'd equalities), inlining methods (particularly those that are only used once or a called the most, such as _ignore_whitespace and _ignore_comment), and pulling string values from the input string as one, rather than building them one character at a time.
This makes a few improvements to make parsing of large projects fairly significantly faster. On my machine, on an example project (which is about 10 MB) it goes from taking 5.9 seconds to parse (fastest of 10 runs) to taking 1.8 seconds, a 3x speedup. Note that I calculated minimum for how fast we could reasonably parse a project of this size by looping over each character, checking if it is "a" and incrementing a counter. This baseline minimum took 0.9 seconds, so we're much closer to that now.
The basic strategies taken here were to speed up padding/whitespace/comment parsing by using a
frozenset
to check whether a character was one of the ones we care about (which is significantly faster than chainingor
'd equalities), inlining methods (particularly those that are only used once or a called the most, such as_ignore_whitespace
and_ignore_comment
), and pulling string values from the input string as one, rather than building them one character at a time.