saadsharif / ttds-group

TTDS Group Project
3 stars 0 forks source link

Performance Improvements - skip pointers, storage format, json ecoding #24

Closed gingerwizard closed 2 years ago

gingerwizard commented 2 years ago

Index side of skip pointers. Only index side is done.

Also disabled current bert model as we aren't using (yet) and it slows down startup.

gingerwizard commented 2 years ago

This also uses a SIMD optimised json parser to - getting our mean performance close to 1s.

Max: 7.019331
Min: 0.010152
Median: 1.0523820000000002
Mean: 1.3045525846613546
Harmonic Mean: 0.3188781817195705
95% Percentile: 3.500890149999998
gingerwizard commented 2 years ago
Max: 6.804227
Min: 0.009796
Median: 1.0929375000000001
Mean: 1.354577529880478
Harmonic Mean: 0.324104079055473
95% Percentile: 3.7586722999999975
99% Percentile: 4.9927768000000015

Using skip pointers. Can't use the SIMD parser as efficiently as id like as its not thread safe.

gingerwizard commented 2 years ago

Moving to a pure text encoding of positions and pointers, parsing manually vs json gives minor improvements

Max: 7.672702
Min: 0.00736
Median: 1.0564274999999999
Mean: 1.3496451603585657

I think this is worth staying with @lollobaldo since its alittle faster at indexing and doesnt rely on libs (simdjson) that is platform dependent.

gingerwizard commented 2 years ago

I don't think these skip lists help enormously on AND queries - any early loop termination is offset by the increased cost of the read. They do lower the 95th and 99th percentiles though and improve the worse case soo merging.