vatsalmevada commented 4 years ago

Currently, many regex patterns are being compiled multiple times during tokenisation. This change will reuse compiled regex patterns.

Single threaded JMH benchmark for a simple query showed 4X performance gain with this change. (For complex queries it might be even higher as some regex were compiled for each token.)

Perf Readings:

Before

Result "com.github.vertical_blank.sqlformatter.Benchmark.format":
  550102.151 ±(99.9%) 2618.739 ns/op [Average]
  (min, avg, max) = (545433.044, 550102.151, 559765.481), stdev = 3015.744
  CI (99.9%): [547483.411, 552720.890] (assumes normal distribution)

# Run complete. Total time: 00:00:40

Benchmark         Mode  Cnt       Score      Error  Units
Benchmark.format  avgt   20  550102.151 ± 2618.739  ns/op

After

Result "com.github.vertical_blank.sqlformatter.Benchmark.format":
  124452.246 ±(99.9%) 406.149 ns/op [Average]
  (min, avg, max) = (123735.574, 124452.246, 125859.623), stdev = 467.722
  CI (99.9%): [124046.097, 124858.395] (assumes normal distribution)

# Run complete. Total time: 00:00:40

Benchmark         Mode  Cnt       Score     Error  Units
Benchmark.format  avgt   20  124452.246 ± 406.149  ns/op

vatsalmevada commented 4 years ago

@vertical-blank please have a look at these changes.

vertical-blank commented 4 years ago

Thanks so much!

vertical-blank / sql-formatter

Reusing compiled regex patterns #25

Perf Readings:

Before

After