grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.53k stars 3.4k forks source link

[Bloom] Shard the Bloom block into slices based on the hash of the token value written #12386

Open honganan opened 6 months ago

honganan commented 6 months ago

Is your feature request related to a problem? Please describe. I've noticed that the community is rolling out a new feature for accelerating queries with Bloom filters, which is a significant leap forward. I've perused the source code, and its ingenuity is astounding.

In our scenario, it's common for a single query to scan over 10TB of data. By rough estimates, the N-gram tokens generated from this data would reach terabyte levels when placed into a Bloom filter. Even if we boost I/O capabilities by adding more disks, this would still incur considerable costs.

Describe the solution you'd like We could hash-map the generated N-gram tokens, for example, into 100,000 shards (write I/O might be under pressure, but we can add some machines and increase disk IOPS). These shards would be stored as separate files.

When a user executes a query, we would perform the same hash mapping on the tokens generated by the query's line filter. This means that a line filter of 32 characters in length would only map to about 30 of the shards. In other words, we would only need to load the Bloom filter data for 30 out of 100,000 shards, which would significantly improve the performance of accelerated queries.

Describe alternatives you've considered I might not have fully understood the design of the community, and I apologize for suggesting this idea somewhat presumptuously.

Additional context

honganan commented 6 months ago

@owen-d

chaudum commented 5 months ago

[!NOTE] Bloom filters are an experimental feature and are subject to breaking changes.

Hi @honganan

In our scenario, it's common for a single query to scan over 10TB of data. By rough estimates, the N-gram tokens generated from this data would reach terabyte levels when placed into a Bloom filter. Even if we boost I/O capabilities by adding more disks, this would still incur considerable costs.

How do you come this this conclusion? The default values of n-gram length, skip-factor, and false-positive rate we use for bloom filter creation results in a bloom data of roughly 2-3% of the chunk data.

The usually great thing about log data is, that a lot of data is repetitive/static and only a fraction is variable. This causes the amount of unique n-grams to be "relatively small", compared to the overall token count.

You can try generating bloom blocks to see what's the ratio between chunks data and bloom data is for you. I suggest that you read https://grafana.com/docs/loki/latest/operations/query-acceleration-blooms/ first.

honganan commented 4 months ago

Note

Bloom filters are an experimental feature and are subject to breaking changes.

Hi @honganan

In our scenario, it's common for a single query to scan over 10TB of data. By rough estimates, the N-gram tokens generated from this data would reach terabyte levels when placed into a Bloom filter. Even if we boost I/O capabilities by adding more disks, this would still incur considerable costs.

How do you come this this conclusion? The default values of n-gram length, skip-factor, and false-positive rate we use for bloom filter creation results in a bloom data of roughly 2-3% of the chunk data.

The usually great thing about log data is, that a lot of data is repetitive/static and only a fraction is variable. This causes the amount of unique n-grams to be "relatively small", compared to the overall token count.

You can try generating bloom blocks to see what's the ratio between chunks data and bloom data is for you. I suggest that you read https://grafana.com/docs/loki/latest/operations/query-acceleration-blooms/ first.

Sorry for reply late. It's my fault not described our scenes clearly, we have a datacenter which belongs to a single tenant have more than 100TB logs per day, and one of the biggest gateway service produced 10TB+.

The N-Gram tokenizer generates really small data. Our difficulties are a single query aiming to the gateway service will at least to read 200GB(calculated as 2% of 10TB) of bloom blocks, how to make it finished in seconds is a problem. We figured we might need 100 SSD storage with 300MB/s bandwidth.

honganan commented 4 months ago

The inspiration for this idea is every n-gram token is explicitlly, can calculate a hash number and be mod by a total shard number. So do the query n-grams. Suppose we have a hello world log line, we can write it like(shard to 3000 blooms):

hash(hell) / 3000 = 0         // bloom 0(Suppose is 0)
hash(ello) / 3000 = 1         // bloom 1
...
                              // bloom 2999

If a query is {app=xxx} |= "ello", we calculating it's in bloom 1, then we read bloom 1 from 3000 blooms, the speed should significantly be faster.

I have finished the initial development and local test. Glad to send a PR when it's good enough.