polarsignals / frostdb

❄️ Coolest database around 🧊 Embeddable column database written in Go.
Apache License 2.0
1.27k stars 65 forks source link

Sampling reservoir materialize #909

Closed thorfour closed 1 month ago

thorfour commented 1 month ago

This adds a new setting to the reservoir sampler that causes it to rebuild a new record from the the existing samples in the reservoir.

The problem we were seeing is that say you have a sampler of 10k rows. If enough records pass through the sampler you'll end up having sampled 10k record but are only going to use a single row from each record. Which means the query engine is holding onto 10k records even though it only wants a single row from each. Which can cause memory to balloon.

Now the sampler can be configured to trigger at a certain size in bytes that are being reference by the reservoir. Once that number of bytes is reached the reservoir will copy all the rows it's currently sampling into a new record, and release all the underlying records.