risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
6.8k stars 563 forks source link

Tracking: implement PrefixBloomFilter #3871

Closed Li0k closed 2 years ago

Li0k commented 2 years ago
chenzl25 commented 2 years ago

Type Timestampz is encoded as i64 which is different from Timestamp, so we need to fix it.

Li0k commented 2 years ago

SSTBuilder will hold the SliceTransform and add key to bloom_filter, we need to compare prefix to avoid adding the same key repeatedly. Duplicate keys may inflate bloom filter size

Li0k commented 2 years ago

As default we use FullKeySliceTransform to ensure the same behavior as before. (for normal compaction)

using the exisiting table_id from compaction_group to filter the slice_transform. but In some cases existing_tableids does not exist (shared buffer compaction/ all table_id in sst has ben deleted). Some thoughts on this

  1. use FullKeySliceTransform for this case to ensure the same behavior as before
  2. use DummySliceTransform for this case to reduce cost of bloom_filter and rebuild it when normal compaction happend
  3. Adding TableId to WriteBatch ensures that we can get table_id that exists in shared_buffer. And get slice_transform by table_id (prefer this option )
Li0k commented 2 years ago

some bench result

bench-08-17

bench-08-17-2

Li0k commented 2 years ago

bench-q20-0817

bench-q20-0817-2