martinsumner / leveled

A pure Erlang Key/Value store - based on a LSM-tree, optimised for HEAD requests
Apache License 2.0
355 stars 33 forks source link

Extend query support in leveled #433

Open martinsumner opened 7 months ago

martinsumner commented 7 months ago

Currently in Riak/leveled query support is based on range queries with the application of regular expressions to allow filtering of additional attributes which have been concatenated to the term.

The following extensions are proposed:

The dependency on regular expressions raises potential issues with regards to performance. Regular expression queries should be added to perf_SUITE. There is a proposal to change the erlang re library to google re2 in OTP 28, with some potentially significant performance benefits. It may be worth experimenting with a RE2 NIF in preparation.

martinsumner commented 7 months ago

https://github.com/martinsumner/leveled/pull/434 - extends performance testing to understand the impact of potential changes.

For regex queries, within this test the time to apply the regular expression dominates:

Profile regex_query:

FUNCTION                                          CALLS        %      TIME  [uS / CALLS]
--------                                          -----  -------      ----  [----------]
prim_file:pread_nif/3                              5938     0.51    109603  [     18.46]
leveled_penciller:'-find_nextkeys/6-fun-1-'/1   5111790     0.89    190579  [      0.04]
leveled_codec:'-accumulate_index/2-fun-2-'/6    5276501     1.77    380060  [      0.07]
leveled_codec:maybe_accumulate/5                5441603     1.87    401815  [      0.07]
maps:update_with/3                              5276892     2.09    449656  [      0.09]
zstd:quick_decompress/1                          105536     4.66   1002559  [      9.50]
erlang:binary_to_term/1                          105536     8.25   1775847  [     16.83]
leveled_penciller:find_nextkeys/6              22701169     9.57   2059858  [      0.09]
re:run/2                                        5276501    67.90  14619748  [      2.77]
---------------------------------------------  --------  -------  --------  [----------]
Total:                                         52969030  100.00%  21531323  [      0.41]

The regex query test will filter out and return 1:170 results. On M1 Mac:

Fetch of 125073 index entries by regex in 423 queries took 31764 ms

Thats an average of 75ms per query to return about 300 results having scanned 50K entries.

martinsumner commented 6 months ago

Comparison if re2 is plugged into leveled:

Profile regex_query:

FUNCTION                                          CALLS        %      TIME  [uS / CALLS]
--------                                          -----  -------      ----  [----------]
leveled_penciller:'-find_nextkeys/6-fun-1-'/1   4986634     1.96    196044  [      0.04]
leveled_codec:'-accumulate_index/2-fun-2-'/6    5147292     3.70    370345  [      0.07]
leveled_codec:maybe_accumulate/5                5308330     4.04    404005  [      0.08]
maps:update_with/3                              5147672     4.25    425160  [      0.08]
zstd:quick_decompress/1                          102708    10.27   1027339  [     10.00]
erlang:binary_to_term/1                          102708    17.00   1700697  [     16.56]
leveled_penciller:find_nextkeys/6              20524284    18.39   1839717  [      0.09]
re2:run/2                                       5147292    33.67   3368657  [      0.65]
---------------------------------------------  --------  -------  --------  [----------]
Total:                                         49960989  100.00%  10003893  [      0.20]

So re2 at 0.65 microseconds per call is 75% faster than PCRE regex in OTP 26.

Fetch of 125252 index entries by regex in 428 queries took 20453 ms

Thats an average of 48ms per query to return about 300 results having scanned 50K entries.

martinsumner commented 6 months ago

https://github.com/martinsumner/leveled/pull/435 - switches to use a NIF of. the re2 library taken from https://github.com/dukesoferl/re2