Closed AntonSmolkov closed 4 months ago
The filter size is stored in 1 byte, that is why the filter size cannot be more than 255, which is a reasonable Bloom filter size for the use cases we want to address with stream filtering.
%% |0 |1 |2 |3 | Bytes %% |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7| Bits %% +-------+-------+---------------+---------------+---------------+ %% | Bloom Size | Reserved | %% +---------------------------------------------------------------+
If this is indeed the only technical limitation, and I understand it correctly, then utilizing an additional byte from the reserved space would enable defining a filter size up to 65,535 bytes (65,535 * 8 = 524,280 bits) . Such a filter size would allow for approximately 100,000 entries with an error rate of 10%, compared to the current limit of 388 entries with a 255-byte filter. Such filter size would let to have ~100k entries with error rate of 10% (instead of curret 388 entries, with 255 bytes filter)
I have a use case like this: There is an IoT system that collects data from millions (approximately 1 or 2 million) sensors. We are considering publishing this data to stream (or a superstream) so that users (microservices) can consume data only from subsets of sensors they wish to access (in the tens of thousands).
I stumbled upon RabbitMQ Streams and its new filter functionality and thought it could be a matching solution. Please let me know if this use case is totally exotic for the RabbitMQ.
The filter size is stored in 1 byte, that is why the filter size cannot be more than 255, which is a reasonable Bloom filter size for the use cases we want to address with stream filtering.
To add to that, the choice of a single byte field was by design.
It is not "totally exotic" but most users with fleets of devices use MQTT with queues.
We don't have sufficient evidence for much interest in larger cardinality for stream filtering.
@AntonSmolkov going from what Osiris has today to millions of "microstreams" within a stream would require 24 or 32 bytes for the Bloom filter. That's a heck of a jump to consider it to be a no-brainer, and there are only 4 bytes available in the chunk header right now.
You probably should stick to MQTT with queues (see above) for now.
@AntonSmolkov going from what Osiris has today to millions of "microstreams" within a stream would require 24 or 32 bytes for the Bloom filter. That's a heck of a jump to consider it to be a no-brainer, and there are only 4 bytes available
Well, given that consumers(applications) would subscribe to only tens of thousands of these "microstreams", these 1 million microstreams can be distributed to 10 (or more) streams using consistent hashing. Using one extra byte seems to be sufficient under these circumstances. (100k entries per filter with error rate of 10%, as calculated above)
It is not "totally exotic" but most users with fleets of devices use MQTT with queues.
I'm concerned that regular queues might not be able to handle such a high ingest rate. In my scenario, the data from the sensors is already collected in a massive Kafka topic; the objective is to distribute subsets of data (sensors) to dozens of consumers (applications) with minimal traffic and storage overhead. But i'll explore this option further, thank you.
We don't have sufficient evidence for much interest in larger cardinality for stream filtering.
Let this enhancement proposal serve as the first piece of evidence :)
Hi there!
Now, the stream's Bloom filter size is limited to only 255 bytes. The reason for this limitation is detailed in the corresponding article:
Additionally, the .NET client library documentation mentions:
So, why not allow setting any arbitrary integer for the Bloom filter size? Are there any technical restrictions to consider?
Describe the solution you'd like
I'd like to have the ability to set any arbitrary limit to the bloom filter size. Maybe this ability must be enabled additioanly via feature flag beforehand.
Describe alternatives you've considered
Additional context
Implementation of the requested feature whould make RabbitMQ Streams more competitive over NATS JetStreams, which consumer's have FilterSubjects property
Filter size for 25k elemets with 20% error rate