Open diranged opened 3 months ago
Hi @diranged
Thanks for reporting. Wrt your question:
Is it possible that the bloom filtering code doesn't know how to handle sharded buckets?
Very likely yes. We haven't tested with multiple buckets yet. Even though the bucket client the bloom gateway uses should support that, it's possible that it doesn't.
I will look into this.
Hi @diranged
Thanks for reporting. Wrt your question:
Is it possible that the bloom filtering code doesn't know how to handle sharded buckets?
Very likely yes. We haven't tested with multiple buckets yet. Even though the bucket client the bloom gateway uses should support that, it's possible that it doesn't.
I will look into this.
Hey @chaudum, did you have the time to indeed look into it ? Cheers!
@diranged Could you share your Loki configuration?
Do you see any errors, or is your assumption solely based on the filter metrics?
However, in our Production environment where we have multiple buckets, we see that the bloom files are spread around the buckets in a seemingly random pattern.. across the 8 buckets, here's the distribution of files:
In a multi-bucket setup, the object keys are hashed and distributed across the available buckets using modulo.
@diranged Could you share your Loki configuration?
Our configuration is quite large - are there sections you'd like to see?
Do you see any errors, or is your assumption solely based on the filter metrics?
No errors - specifically just working based on the fact that we see no metrics reported in the sharded environment, but we do see them in our single-bucket test environment.
Describe the bug We have a "staging" and "production" Loki environment ... and the only real fundamental difference is that our production environment uses sharded S3 buckets to handle API rate limits (we use 8 buckets). In our staging environment we can see that bloom filtering is working (via the responsiveness of the queries and the metrics), but in our production environment (with the identical loki config) bloom filters are not working for queries according to the metrics.
Staging Proof
Production Broken Though
In our "staging" environment, we can see the
blooms
directory being populated properly withblooms/blocks/XX/xx.gz
andblooms/metas/xx.json
files in the single bucket:However, in our Production environment where we have multiple buckets, we see that the bloom files are spread around the buckets in a seemingly random pattern.. across the 8 buckets, here's the distribution of files:
Is it possible that the bloom filtering code doesn't know how to handle sharded buckets?