Open bwplotka opened 4 years ago
I think I would vote for new flag and StoreInfo endpoint returning types supported, like previous proposal mentioned.
Plus on the sidecar you could do --rule-api=disabled
, --store-api=disabled
or smth similiar.
Thoughts? Maybe we can write a mini proposal for this?
I'm also thinking that potentially we could do TargetsAPI which would show leaf node Targets scraped (I mean Prometheus /targets endpoint) and dedup them? Thoughts?
I think I would vote for new flag
New flag? what flag?
and StoreInfo endpoint returning types supported, like previous proposal mentioned.
Hm, yea we wanted to different info per type so its kind of equivalent. So we can detect things either by grpc reflection or info... what would be better? :thinking:
Also for target API there is issue already: https://github.com/thanos-io/thanos/issues/1375
Ok we discussed offline that it would be amazing to actually split discovery as separate gRPC API.
I think we can do it after Rules API is landed, but overall idea is that Info
endpoint is shared and Querier will only wait for --endpoint
flag or something which will then ask for info API and THEN discover what APIs are there (: This will ensure things like https://github.com/thanos-io/thanos/pull/2407#discussion_r414743260 and per store TLS will be much easier to implement.
cc @brancz @s-urbaniak @kakkoyun @squat @povilasv @GiedriusS
:+1: I like --endpoint
Changed title.
Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
Closing for now as promised, let us know if you need this to be reopened! 🤗
Still valid.
Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
Closing for now as promised, let us know if you need this to be reopened! 🤗
We need that! (:
Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
Closing for now as promised, let us know if you need this to be reopened! 🤗
Still valid.
Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
FYI: This assumes Querier is the one that federates "stuff" (:
I would be interested in tackling this as due to a possible bug we had recently I noticed that rules are tightly coupled to stores so when we try to resolve rules but stores fail sadly nothing gets served.
But first, let me try to explain if I understand correctly what we want to achieve:
This issue wants to decouple serving stores and rules and in the future targets when they are added. By passing an --endpoint
it will auto-discover what type that address is, e.g. rules or stores API, and discover them and use those API sets separately. We would also have a generic lets call it "endpointset" or "infoset" which does this. The endpointset would have a GetRuleAPIs/GetStoreAPIs which would replace the current storeset.Get and storeset.GetRulesClients. With this we could also decouple the rules away from stores, so both could be discovered and served independently.
I believe also having support for --store flag makes sense, and using that as the default endpoint to discover all stores there, would make sense. And since --rule is not exposed we could deprecate this straight away?
Let me know if anything is wrong or doesn't make sense, thanks!
By passing an --endpoint it will auto-discover what type that address is, e.g. rules or stores API, and discover them and use those API sets separately.
Yes, but with a note that endpoints can serve multiple APIs
We would also have a generic lets call it "endpointset" or "infoset" which does this. The endpointset would have a GetRuleAPIs/GetStoreAPIs which would replace the current storeset.Get and storeset.GetRulesClients. With this we could also decouple the rules away from stores, so both could be discovered and served independently.
hmmmm, I wouldn't use word independently. If the endpoint serves both Rules and Store API, those will be discovered (and probed for healthyness) in the same flow. Essentially we will do it per endpoint -no matter how many and what kind of APIs it serves. From endpointset
it has to serve Info
I believe also having support for --store flag makes sense, and using that as the default endpoint to discover all stores there, would make sense. And since --rule is not exposed we could deprecate this straight away?
Yes, but I would deprecate store
as well long term in favor of endpoints
. WDYT? also note there is store.sd-file
option as well allowing FILE SD like discovery - we will need to do that for endpoints as well :hugs:
hmmmm, I wouldn't use word independently. If the endpoint serves both Rules and Store API, those will be discovered (and probed for healthyness) in the same flow. Essentially we will do it per endpoint -no matter how many and what kind of APIs it serves. From endpointset it has to serve Info
yes, I think we are on the same page, I agree decoupling is not exactly the right word for this.
Yes, but I would deprecate store as well long term in favor of endpoints. WDYT?
sgtm
Okay, I will get started on this tomorrow, busy day of meetings today. Thanks for the feedback!
@bwplotka curious about this point from earlier in this issue:
So we can detect things either by grpc reflection or info... what would be better? 🤔 Ok we discussed offline that it would be amazing to actually split discovery as separate gRPC API.
Curious, did some deep diving into the code and curious how come you all moved away from using the gRPC reflection? I did some experimenting and I can discover what APIs the endpoint has. So for example in my little experiment, I query via the gRPC reflection services and when trying to discover in thanos query I can see my sidecar has the following services/APIs (e.g. Rules and Store):
grpc.health.v1.Health
grpc.reflection.v1alpha.ServerReflection
thanos.Rules
thanos.Store
Would this not be enough info to discover the passed --endpoint
APIs? Why was the decision towards moving to a separate gRPC info API? Thanks!!
Hm I might think separate InfoAPI might be orthogonal to endpoint discovery.
We still need to discover types and we can use reflection yes - we added that recently to our gRPC servers 🤗 We could use health endpoint for health( probably we should even)
The problem is with metadata. They are propagated through Info like: time scope, ext labels, source and potentially others in future. But ofc we are happy for amazing proposition that will improve things here! (:
We could use health endpoint for health( probably we should even)
Agreed, was thinking of that as well!
The problem is with metadata. They are propagated through Info like: time scope, ext labels, source and potentially others in future.
For the rules in query at least, we just need the external labels yes, in the info, correct? (At least for now.)
The main thing I am even bringing it up is because if we decide to create a new info API, we would still need to fallback to the one as part of the store. Migration would not work otherwise, or would not be as smooth, you could lose connection to existing components if discovery does not work.
I can see we have a few options, that I can think of:
1.Perform initial discovery via reflection and after that getInfo
metadata out of Store
API, if both Rules and Store are present.
Personally I like 2 the most, as it treats its API completely separately, but up to you all as you know more about this, happy to go forward with whatever solution! 🎉
Follow up questions: Do we ever expect rules API to be there, but not store API? What guarantees do we give to the users, when migrating between versions?
Thanks for the answers and reading! 🤗
Come to think of it, the scalable rule storage proposal will cause us to have Rule APIs without a Store API, so I also think 2 is the best choice, so we can treat them entirely separately. Taking this a bit further, did we even truly need external labels in the rule deduplication? Wouldn't any identifier that uniquely identifies a Rule API be sufficient (if so, it appears to me that the IP/address of the Rule API would be enough, thus no Info call necessary at all at this point)? It's entirely possible that I'm missing something, but it appears to me that we had the external labels, and just used that as our hammer.
In any case, option 2 is what I feel is the best.
Hm. I would go for 4 (unless one of those is what is 4 to me):
Why not:
rpc Info
proto servicestore StoreInfo
fields of InfoResponse
if separation is needed?This allows having separation, clear mechanism, clean code, and backward & forward compatibility, no?
I might be missing why this direction will not work?
@bwplotka I think 4. is 1. :) But that would mean they all should have the same metadata in the Info API, or at least some common ones? But from what I understand not all have the same metadata, and judging by @brancz comment when rules API is used separately from store API, what would that metadata would the info API expose?
In case we go with this approach, what should be required in that Info service, which metadata and what would be optional?
Well 1 means only store would have info - to me all of them would need some sort of metadata propagation sooner or later. To me unified Info endpoints means just clarity and less complexity since we have unified metadata propagation. Additionally better flexibility for future. What's the benefit of having different logic fir each kind of mix of APIs?
But let's try to "guess" what we might have:
So StoreAPI is clear:
RulesAPI:
MetadataAPI?
Exemplars
By just yolo listing all APIs I see the commonality here , that's why instinct says to me that Info
should be single one, reused across all APIs.
The extra benefit is mixed APIs e.g StoreAPI and Exemplars we can then have reused LabelSet, Min Max, StoreType (I think it makes sense to reuse those), plus only one propagation communication channel not two.
Nice, thanks for the details! I guess having two metadata that are required/common throughout all the info APIs, makes sense, as we have the StoreType and LabelSets. I am guessing the StoreType would now be ComponentType instead?
Also for migration purposes, we will fallback to a different discovery if no Info API exists, as to ensure a smooth migration?
Also for migration purposes, we will fallback to a different discovery if no Info API exists, as to ensure a smooth migration?
In what sense? Currently every component that is used by querier have InfoAPI implemented and served.
@bwplotka hmm, I meant more like during upgrade path, old thanos component that has not been upgraded (e.g. sidecar) would not have the new Info API, and we would need to fallback to what we currently do, in case of query that is using info from store. But not sure I follow what you mean by has info API, rules does not, or am I wrong here? 🤔
I think I made the wrong call earlier, I think maybe it is time for a design doc. It's getting rather hard to follow what's suggestion, and what's proposed at this point.
FWIW if I understand correctly, there is an Info API in the room that any component would expose. I don't really like this idea as it doesn't make sense for a component that only serves the Rule API for example to expose an API that seemingly returns min/max time. This feels like putting things into the same API because of accidental overlap. Feels to me like the rule of "prefer duplication over the wrong/premature abstraction".
I think I made the wrong call earlier, I think maybe it is time for a design doc. It's getting rather hard to follow what's suggestion, and what's proposed at this point.
Fine by me, do you want me to create one? If yes, which proposed solution, or should I outline both of them?
@lilic in design doc (we do them on Github), please check ideally, we can have the proposed one chosen by you and other in ## alternatives
section.
This feels like putting things into the same API because of accidental overlap. Feels to me like the rule of "prefer duplication over the wrong/premature abstraction".
I don't think premature abstraction is when we already know external labels are used right now in Rules API and we have concrete example of need of those in first iterations of Metadata and Exemplaers.
I would call it premature duplication of effort and introducing complexity of migrations all over the place.
It has to be one propagation pipeline, can't see other way.
Nothing bad would be to have a similar technique as we have with hints: https://github.com/thanos-io/thanos/blob/c73143d131953fe8daa7975f3ba3dfd7a9785073/pkg/store/storepb/rpc.proto#L109 or something.
Looks like a need to for design yes (:
hmm, I meant more like during upgrade path, old thanos component that has not been upgraded (e.g. sidecar) would not have the new Info API,
hmm, yea my idea was that even if you separate InfoMethod to separate gRPC service, this does not break existing StoreAPIs as long as we keep same proto format of message. But it might not work like that, worth to check.
Let's discuss it on a design doc. We're not that far off from each other. I was not talking about label-sets, in my last comment I agreed that label-sets are in common and that there could be a common API for it. Min/max time is an example that I'm talking about that does not universally apply.
Looks like design doc is merged, so help wanted to implement this 🤗
I can get started in roughly a week or two. If no one starts that by then I can take it. Does that sound good?
Hi @lilic! Have you started working on this? Because we are also planning to work on this proposal during my LFX mentorship 🙂 //cc @prmsrswt @squat
Feel free to take it! If you have any questions around the proposal feel free to reach out!
@hitanshu-mehta are you currently implementing this proposal? If so let's add you as the assignee of this ticket so it is 'official' :)
@ianbillett Yes, I have started implementing the proposal. Here is the draft PR :)
Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
Closing for now as promised, let us know if you need this to be reopened! 🤗
Just to keep tabs on this, since now https://github.com/thanos-io/thanos/pull/4282 has been merged, what's still missing? Some points:
--endpoint-strict
similarly to --store-strict
(taken by @hitanshu-mehta: https://github.com/thanos-io/thanos/pull/4874 )cc @hitanshu-mehta - as I said if you'd like a hand with these, happy to help
Something decided on our public meeting: https://docs.google.com/document/d/165L9DVKHW0sS2PfKYsUCDikwq5XngKkCqTOVjzC6qQE/edit#heading=h.7dcto2wgvh4q
Something to do in later step. This will avoid us to have all those weird flags like separate SD, separate store etc. We can disable certain APIs if needed on API level (e.g ruler can disable RulesAPI if needed) (:
cc @s-urbaniak