The original implementation built strings in reverse because it was logically simpler to reason about however it resulted in a significant performance and memory usage impact on larger strings with lots of changes. This could be seen in Svelte, Liquid, and Angular.
This implementation is based on #311 (thanks @ABuffSeagull) with some additional tweaks which increase performance further. I ran some benchmarks on my M3 Max (12p / 4e) and tested strings of various sizes with a different number of changes:
a small string: 220 bytes with 2 changes
a medium string: 4.4KB with 5 changes
a large string: 44KB with 50 changes
an extra large string: 4.4MB with 500 changes
an extra large string with a large number of changes: 4.4MB with 5,000 changes
While we're not likely to run into cases with extra large strings I felt it was still useful to analyze the performance impact in extreme cases so we can be confident that this is not a performance bottleneck for any file (or expression) we're likely to encounter.
Before this PR:
name hz min max mean p75 p99 p995 p999 rme samples
· small string 7,001,413.96 0.0000 0.5530 0.0001 0.0001 0.0002 0.0002 0.0003 ±0.51% 3500707
· medium string 694,515.66 0.0011 0.1906 0.0014 0.0013 0.0018 0.0020 0.0685 ±0.82% 347258
· large string 9,490.75 0.0776 0.1989 0.1054 0.1488 0.1647 0.1677 0.1751 ±0.84% 4746
· extra large string 2.8841 339.10 356.18 346.73 352.49 356.18 356.18 356.18 ±1.26% 10
· extra large string (5k changes) 0.2724 3,560.79 3,812.77 3,671.26 3,689.60 3,812.77 3,812.77 3,812.77 ±1.26% 10
After this PR:
name hz min max mean p75 p99 p995 p999 rme samples
· small string 8,163,442.24 0.0000 0.5489 0.0001 0.0001 0.0002 0.0002 0.0003 ±0.23% 4081722
· medium string 4,052,634.65 0.0002 0.2099 0.0002 0.0003 0.0003 0.0004 0.0005 ±0.54% 2026439
· large string 545,278.66 0.0016 0.1896 0.0018 0.0018 0.0022 0.0023 0.0077 ±0.46% 272640
· extra large string 57,464.22 0.0155 0.2262 0.0174 0.0168 0.0218 0.1151 0.1345 ±0.54% 28733
· extra large string (5k changes) 5,818.68 0.1508 0.3839 0.1719 0.1632 0.3294 0.3367 0.3700 ±0.76% 2910
This amounts to roughly a:
~1.2x speed up for small strings
~5.8x speed up for medium strings
~57.5x speed up for large strings
~20,000x speed up for extra large strings (yes, really)
So, in the old implementation, longer and longer strings had a much bigger perf impact.
Additionally, this should reduce memory usage by quite a bit but I don't have concrete numbers for that. (Just some small random testing which confirmed that usage was lower)
The original implementation built strings in reverse because it was logically simpler to reason about however it resulted in a significant performance and memory usage impact on larger strings with lots of changes. This could be seen in Svelte, Liquid, and Angular.
This implementation is based on #311 (thanks @ABuffSeagull) with some additional tweaks which increase performance further. I ran some benchmarks on my M3 Max (12p / 4e) and tested strings of various sizes with a different number of changes:
While we're not likely to run into cases with extra large strings I felt it was still useful to analyze the performance impact in extreme cases so we can be confident that this is not a performance bottleneck for any file (or expression) we're likely to encounter.
Before this PR:
After this PR:
This amounts to roughly a:
So, in the old implementation, longer and longer strings had a much bigger perf impact.
Additionally, this should reduce memory usage by quite a bit but I don't have concrete numbers for that. (Just some small random testing which confirmed that usage was lower)