Open brancz opened 4 years ago
Bloom filters would be a great tool for this. +1 on investigating the false positive rate for querier api requests.
In theory, it sounds good but I wonder about the performance implications of keeping that bloom map up-to-date, especially in the case of Sidecar/Prometheus. This probably makes the most sense in environments where are a bunch of disparate Thanos Query instances and there is one "on top" which is being queried. Either way, it would be nice to have such a metric so that we could rationalize this idea better.
In my eyes there will just small and very often used portion of labels maintained by such filter, so it should be fine.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I think we have yet to instrument the proxy store type (used with the query component) to validate whether and how much this is happening. I do suspect that it will highly vary across environments.
With #2030 merged we should be able to at least identify the potential for this.
This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.
This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.
Maybe it's still valid.
Looking at the metrics in our prod environment, the metrics seem to show that we do a lot of requests that return empty responses.
Hello π Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! π€
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
As mentioned in the previous comment, there is definitely room for improvement still.
Hello π Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! π€
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
Hello π Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! π€
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
Still valid.
Hello π Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! π€
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
Hello π Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! π€
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
Do we have any plans/design docs about this feature? I have some time recently so maybe I can do some experiments first.
One simple idea is to get all metric names (/api/v1/label/name/values) for each store in Info()
StoreAPI and keep them in a bloomfilter in StoreRef
. Only keeping metric names cannot work for queries without specifying __name__
, but it works for most cases and I think it is a good start.
The problem is that Info
API is called every 5s each store, so I am not sure about the performance overhead (Might be big because we need to query all metric names in the store's time range).
For updating/deleting the elements in the bloomfilter, how to make it efficient in each update?
Do we have any plans/design docs about this feature?
I don't think we have. It would be great place to start when one decides to pick up this issue. We can elaborate more on those challenges you have mentioned.
Also there's also a bit of experimentation needed to pick the correct approach/data structure for implementation. I have come across some nice articles and projects recently worth looking before start. (Haven't dug deep though) On Bloom Filter efficiency: https://gopiandcode.uk/logs/log-bloomfilters-debunked.html On Quotient filters: https://github.com/facebookincubator/go-qfext
Hello π Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! π€
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
I think we have yet to measure how often were unnecessarily hitting stores to validate further work.
Hello π Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! π€
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
Still valid, help wanted.
Hello π Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! π€
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
I have worked on this for a bit https://github.com/thanos-io/thanos/compare/main...GiedriusS:feature/bloom_metric_names and I have some insights:
foo{bar="baz"}
can not just exactly match, but it can also match foo{bar="baz",aaa="bbb"}
and so on.Given this, I am not sure how useful this feature would be with bloom filters. Quotient filters do not seem to be any different in regard to these two points. I couldn't find any data structure which would be more useful with how labels/PromQL works.
This implementation looks nice. Great work! I feel we can have labels in the bloom filter as well and it should work. If one label is specified in one equal or regex match, then the label name needs to there. I think the label names and metrics names are almost the same for blocks in S3 so it doesn't help a lot in this case. The improvement would be adding label values to the bloom filter for equal match? It works well if you want to query some high cardinality labels like pod id. But this increases the bloom filter false positives, too...
Hello π Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! π€
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
Closing for now as promised, let us know if you need this to be reopened! π€
Maybe let's add an initial version based on label names? The matchers can only be exact there so bloom filters should work wonderfully there :thinking:
Hello π Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! π€
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
Closing for now as promised, let us know if you need this to be reopened! π€
Since some preliminary research has already been done, should we add a gsoc/lfx label on this issue?
We talked a bit about this during KubeCon and I think the consensus is that we need some central place where we'd store the bloom/cuckoo/... filters. This is because if we would store them locally then in a distributed system we could get inconsistent results. For example, one Thanos Query could retrieve data from nodes A, B whereas another Thanos Query could have newer information and it would retrieve data from nodes A, B, C. This would lead to inconsistent results. Hence, something like etcd/Consul/... is needed to store these filters for each StoreAPI.
I think it's still doable during GSoC/LFX.
Hm, isn't that the same case we have with filtering out stores based on external/announced labels?
Hmmm, that's true :thinking: then maybe perhaps we wouldn't be introducing any new inconsistency issues. Let's mark it as a GSoC/LFX project.
One point to discuss: whether we want to use something like Info
API at central Queriers to collect the bloom filters info at central or not.
If queriers collect that info periodically, we will get:
If we add another API for each store to test whether some metric exists or not, this always involves a RTT.
For the inconsistency problem, I think we can make it manageable if we only keep metric names in bloom filters. They are usually slow to change. Usually a metric will stay for a long time in a store and once it goes away, it's still okay for the querier to reach out to that store.
I don't think we can make the assumption that metrics names are slow to change. Whenever a new block gets loaded we cannot tell whether new metrics have been introduced or not. Unless some push model is used, with current pull model we cannot get always up-to-date data.
Correct, but loading a new block already has an expected delay, which means we can tolerate a 5-10s delay when announcing new metrics from this block. Maybe this bloom filter can be rebuilt when a block is loaded?
The bigger challenge is in-memory ingestion like Prometheus and Receivers, but if we keep the delay to a few seconds, I don't think it will be a big issue. There might be some inconsistency when one Prometheus replica has announced a metric and another hasn't, but I guess this is why we duplicate data so it won't be that different from a partial read.
The bigger challenge is in-memory ingestion like Prometheus and Receivers, but if we keep the delay to a few seconds, I don't think it will be a big issue. There might be some inconsistency when one Prometheus replica has announced a metric and another hasn't, but I guess this is why we duplicate data so it won't be that different from a partial read.
Right. I think we don't have a good way to deal with this case now. For either Prometheus sidecar or receiver mode, how is the bloom filter maintained for in-memory data?
Receiver mode seems fine since we can introduce a bloom filter and update it when writing. But sidecar mode is hard and we can only do pulling periodically.
I am still thinking about using the trigram index mentioned in the Monarch paper https://www.vldb.org/pvldb/vol13/p3181-adams.pdf.
This is similar to the bloom filter way but works better in the regex match case. But the same problem here is that I don't see a good way to maintain the trigrams in memory, especially for Prometheus and Receivers.
I think for receivers we might be able to build the index during ingestion time and maybe rebuild it on boot if there is a crash.
I agree that for the sidecar it will be quite challenging. Maybe we could do labels/__name__/values
calls against Prometheus on periodic basis.
Thanos currently performs a "possibly in set" check in order to exclude certain StoreAPIs from calls. This check currently only excludes a StoreAPI if all sets of external labels indicate that time-series selected by a certain query will never include data from this StoreAPI. While this check only has true positives, the potential for empty responses is still large, as it requires the query to truly select on the labels that a StoreAPI happens to expose.
In order to expand these "possibly in set" checks, I propose to have StoreAPIs additionally expose a probabilistic datastructure(s) via the
Info
API next to the sets of external labels. Datastructures such as Bloom filters or Cuckoo filters, could be a good candidate. This way we could in a space efficient way minimize unnecessary calls to StoreAPIs.This would not only improve performance, but also reliability, as any additional and unnecessary network request has the potential to cause what seems like a partial response even when it's not.
In order to verify further, that this would be useful, I would propose to instrument the querier with metrics to verify that these types of empty StoreAPI series requests are actually happening and therefore impacting performance, otherwise this optimization is likely not worth it.
@bwplotka @domgreen @GiedriusS @metalmatze @squat @kakkoyun