katef / libfsm

DFA regular expression library & friends
BSD 2-Clause "Simplified" License
931 stars 52 forks source link

Use insertion sort for small `sort_and_dedup_dst_buf` inputs #459

Closed silentbicycle closed 8 months ago

silentbicycle commented 8 months ago

As a follow up to #456 -- kate mentioned that examples/words still had the same outliers (https://github.com/katef/libfsm/pull/456#issuecomment-1881891325). As hashing dropped lower in the profile, spending a disproportionate amount of time in sort_and_dedup_dst_buf showed up. This makes sense, because the outliers in the words output come from blab's successive seeds generating specific outputs (such as combining 855 instances of the regex GyADewtfILV53SSPua4IKbEZbug6BWTJ8o22K6ydeXs0NWsMADsGyADewtfILV53SSPua4IKbEZbug6BWTJ8o22K6ydeXs0NWsMA...) that are pathological knots for determinisation to untangle, not just a gradually increasing input size.

Currently, sort_and_dedup_dst_buf handles its input with a couple strategies:

The outlier appeared to be due to from frequently calling it with inputs which only had a few values, but a large delta between the min and max values, so it was effectively zeroing a couple KB on the stack, setting just a few bits, and then sweeping over it again to write out an array of the set bits -- when the input array is fairly small (< = SMALL_INPUT_LIMIT), it's faster to append ascending values into a temporary buffer (so already sorted input just copies over) and then shift it forward as necessary to plug in the remaining values. In other words, use insertion sort when the entire data set fits within a few cache lines. Iit often does.

When using a words.sh output test data set kate sent me for benchmarking, this brings the total runtime for words -dt from about 11.7 seconds to about 8.4 seconds on my laptop.

I also added a check for whether the input is already sorted and deduplicated, so the function doesn't need to do anything. During benchmarking that case was only reached a small percent of the time, but it's very cheap to also note that while sweeping through to find the min and max values.

katef commented 8 months ago

As of this PR:

; ./words.sh 40000 100 | guff -b
    x: [0 - 19]    y: [0 - 3967]
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄⠀⠂
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠀⠐⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡀⠐⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂⠀⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠀⠈⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡀⠀⠄⠈⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⢀⠀⠠⠀⠈⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⣇⣀⣄⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀