Open GiedriusS opened 2 years ago
Looking forward to the documentation. Can't wait to try it š
https://github.com/thanos-io/thanos/pull/5068 fixes [4] and [5].
Hello š Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! š¤
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
Closing for now as promised, let us know if you need this to be reopened! š¤
From our experience, groupcache is still slow due to the last point - we need to fetch multiple keys in one request.
Hello š Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! š¤
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
Hey @GiedriusS, I would like to pick a sub-issue of this open issue which is Make pull request that pushes changes (Thanos update) back into Cortex
As I am a newbie to open source contribution & Thanos codebase, can someone share some docs to help me understand the requirement better?
Any acceptance criteria for this feature? Or which flow/component in the Thanos codebase should I focus on? :)
Amazing, thanks! So Make pull request that pushes changes (Thanos update) back into Cortex
: I wonder if this is still relevant, given we vendor Cortex code. Could you double-check @GiedriusS ? Also I think it less important, given we are not sure if groupcache can be a better solution than Redis. We need to first figure out if groupcache is worth pursuing, so perhaps your (any new ideas) around making it faster (and benchmark) should be the next step here.
cc @yeya24 @GiedriusS does it sounds right?
Thanks for picking this up! I am personally doubtful whether it is needed to pursue this further, I might even be leaning towards deleting groupcache before 1.0 :cry: as far as I understand, the problem is that we are serializing access to a single map
on each key retrieval: https://github.com/vimeo/galaxycache/blob/master/singleflight/singleflight.go#L41-L64. In Thanos, on a bigger query we can easily get thousands of requests at the same time. I saw this once I've tried deploying groupcache to prod. The latency even jumped after groupcache was enabled :scream: I think the performance issue must be the contention. This blog post is pretty illuminating: https://dgraph.io/blog/post/introducing-ristretto-high-perf-go-cache/. I wonder if there's a better strategy we could use for the map :thinking:
Got it. I think it's nice piece to work, but best effort. Still something to consider if blockers are improved.
What would be a benchmark that could reproduce the problem? What requirements we are looking for in terms of efficiency here?
After https://github.com/thanos-io/thanos/pull/4818 has been merged, I am writing down a list of TODO:
For implementing the last two, perhaps we could even switch to gRPC?
Maybe I have missed something?