koralium / flowtide

Streaming integration engine
https://koralium.github.io/flowtide/
Apache License 2.0
28 stars 2 forks source link

Add support for column store #490

Closed Ulimo closed 1 month ago

Ulimo commented 1 month ago

This is a massive PR since there was alot of code that was required to support column store.

Stream Benchmark after change:

| Method                     | Mean     | Error    | StdDev   | Processed Events / s |
|--------------------------- |---------:|---------:|---------:|---------------------:|
| InnerJoin                  | 409.8 ms | 29.85 ms | 17.76 ms |              1283166 |
| LeftJoin                   | 452.6 ms | 28.30 ms | 16.84 ms |              1657348 |
| ProjectionAndNormalization | 134.4 ms |  9.12 ms |  6.03 ms |              1488602 |
| SumAggregation             | 142.6 ms |  9.51 ms |  5.66 ms |              1402637 |

Before change:

| Method                     | Mean       | Error    | StdDev   | Processed Events / s |
|--------------------------- |-----------:|---------:|---------:|---------------------:|
| InnerJoin                  | 1,138.7 ms | 38.17 ms | 25.25 ms |               446332 |
| LeftJoin                   | 1,134.5 ms | 38.01 ms | 19.88 ms |               663980 |
| ProjectionAndNormalization |   226.2 ms | 12.35 ms |  7.35 ms |               884186 |
| SumAggregation             |   236.4 ms | 19.06 ms | 11.35 ms |               845909 |

This change brings initial support to handle events in a columnar fashion. A group of events are stored in a batch using the Apache Arrow storage format.

Native memory is used instead of managed memory to allow faster freeing of used memory in the LRU cache.