valpackett / pcre-heavy

A Haskell regular expressions library that doesn't suck | now on https://codeberg.org/valpackett/pcre-heavy
https://codeberg.org/valpackett/pcre-heavy
The Unlicense
51 stars 6 forks source link

scanRanges is extremely slow on dense match input #14

Open ChrisPenner opened 4 years ago

ChrisPenner commented 4 years ago

Referenced in https://github.com/ChrisPenner/lens-regex-pcre/issues/10 with benchmarks here: https://github.com/jamesdbrock/replace-benchmark#results

We can see that pcre-heavy is having a bit of trouble with substitutions with LOTS of matches, I did a quick profile on it and the VAST majority of the time is spent inside rawMatch in scanRanges0. On large strings; it clocks in at OVER 10 minutes, whereas the next longest time is just over one second.

After a cursory look, it seems it is serially calling out to C for each consecutive match, and just iterating on that, there's also a lot of fiddly math going on, but nothing that seems to be the "obvious" culprit. Perhaps there's some unknown copying going on too, but I didn't dig deep enough to see.

I can dig around a bit more, but if anyone has suggestions or tips on a good way to speed this up, or even ideas of where the cost centers might be I'm open to ideas!

I understand that the benchmark in question represents a sort of "perfect storm" in terms of triggering bad performance, but it would be nice if there's some way we can get this into at least the same ballpark :)

ChrisPenner commented 4 years ago

This might be a good thing for someone to look into for #hacktoberfest

valpackett commented 4 years ago

I'd be surprised if anyone seriously participates in that, seems like it's more of a spam-fest..

ChrisPenner commented 4 years ago

That's an interesting article, and it's probably true for much of the github ecosystem, but I have a feeling Haskell repos are likely shielded at least a bit from poor quality submissions (Haskell doesn't even show up in the list on the main site).

A quick glance over closed hacktoberfest issues in Haskell shows a surprising amount of legitimate work happening 😄

Granted, this isn't so much of a "quick win"