Currently, many regex patterns are being compiled multiple times during
tokenisation. This change will reuse compiled regex patterns.
Single threaded JMH benchmark for a simple query showed 4X performance
gain with this change. (For complex queries it might be even higher as
some regex were compiled for each token.)
Perf Readings:
Before
Result "com.github.vertical_blank.sqlformatter.Benchmark.format":
550102.151 ±(99.9%) 2618.739 ns/op [Average]
(min, avg, max) = (545433.044, 550102.151, 559765.481), stdev = 3015.744
CI (99.9%): [547483.411, 552720.890] (assumes normal distribution)
# Run complete. Total time: 00:00:40
Benchmark Mode Cnt Score Error Units
Benchmark.format avgt 20 550102.151 ± 2618.739 ns/op
After
Result "com.github.vertical_blank.sqlformatter.Benchmark.format":
124452.246 ±(99.9%) 406.149 ns/op [Average]
(min, avg, max) = (123735.574, 124452.246, 125859.623), stdev = 467.722
CI (99.9%): [124046.097, 124858.395] (assumes normal distribution)
# Run complete. Total time: 00:00:40
Benchmark Mode Cnt Score Error Units
Benchmark.format avgt 20 124452.246 ± 406.149 ns/op
Currently, many regex patterns are being compiled multiple times during tokenisation. This change will reuse compiled regex patterns.
Single threaded JMH benchmark for a simple query showed 4X performance gain with this change. (For complex queries it might be even higher as some regex were compiled for each token.)
Perf Readings:
Before
After