Open daviddyball opened 5 years ago
Definitely going to try and throw something together for this.
Ideas so far are:
a) use JDBC-like URLs to configure individual stores (e.g. grpc+tls://<addr>:<port>/?Option1=y&Option2=n
)
or
b) use file-based configuration like the object store.
@bwplotka If I go for option "b" should I re-use the object-store file config stuff, or are you happy for me to implement a query-store specific config file?
So I think we talked about it here: https://github.com/improbable-eng/thanos/pull/899
I would say for this specifc use case, let's start with option B
and essentially limit, by saying --store
are explicitly out of this feature.
I re-use the object-store file config stuff
What? We should never mix object store stuff with discovery stuff. In fact you have ready FileSD logic which is compatible with Prometheus FileSD. It allows arbitrary labels, so we can implement easily this:
One solution would be to do that work in the FileSD config and have labels such as tls_server_name to override TLS config.
I would say this is the way to go. What do you think? (:
Thanks for proposing this and sorry for massive delay!
@bwplotka Apologies, I meant I can look at writing some similar code to what is in place for FileSD to implement FileStoreConfig
or something :-P Definitely don't want to mix the two 😆
Yea, I would say no specifc config just literally special labels we document properly in our docs
IMO people just coud just run envoy/nginx/etc proxy and handle TLS outside of Thanos. As then we don't have to work on cert reloading all the different options, etc.
@bwplotka wdyt?
I attempted to use nginx, but it appears that the grpc call is not sending a host header. Can anyone confirm this is the case? (Or at least anyone with better knowledge of the code and go than myself)
Why does it need that?
I think you should be able to proxy all the traffic to GPRC backend.
Something like:
http {
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent"';
server {
listen 80 http2;
access_log logs/access.log main;
location / {
# Replace localhost:50051 with the address and port of your gRPC server
# The 'grpc://' prefix is optional; unencrypted gRPC is the default
grpc_pass grpc://localhost:50051;
}
}
}
In my case I would like to park the grpc on a virtual host, instead of running a special nginx instance just for grpc.
I don't think we pass host headers nor we have an option to configure them. So right now I don't think it will work
@mleklund Try getambassador.io for a nice GRPC proxy. It's what we ended up using to get around the problem of not being able to mix TLS and non-TLS stores.
@povilasv I'm not sure I understand your objection to adding this feature. Do you think it serves no purpose?
I'ts not an objection, I just wanted to point out that it won't work right now.
Whether we want to add host header or no is a seperate discussion and it will take a bit of time.
BTW a lot of people right now run isitio / linkerd, the sidecar per service approach, so I think it's not a wide problem.
Up to this point nginx ingress has worked for us and I would rather not complicate things any more then I need to. I have been looking at envoy and ambassador, and it may be a good fit down the road. For the moment I am using port based tcp for nginx ingress since most of our use cases are behind vpn or firewall.
I was curious if I was missing something or if there were plans to pass header info since grpc is built on top of http(2).
@daviddyball , I'm not sure if I understand your use case correctly, but does this pr https://github.com/improbable-eng/thanos/pull/508 solve your problem?
There are flags like --grpc-server-tls-cert/--grpc-client-tls-cert added by this pr in query doc and other components' docs
@benjaminhuo I don't think it does solve the problem here. The issue is that I'd like query
to talk to different stores in different ways (e.g. TLS vs non-TLS) and there is no way to configure per-store endpoint TLS configuration
For now we won't be implementining Per Store TLS configuration. Our general suggestion is run sidecar proxy envoy / nginx / etc.
Hopefully we will update docs soon. We will close the ticket once the docs are there.
I would also like something like this.
My idea, which may be wrong, is:
query
instances on the central cluster talk to query
instances in other clusters over TLSquery
instances on the central cluster also talks to local cluster's store and sidecars
ideally, I would like to define something like:
- query.cluster1.foobar.com:443
- query.cluster2.foobar.com:443
- query.cluster3.foobar.com:443
- query.cluster4.foobar.com:443
- dnssrv+_grpc._tcp.thanos-sidecar-grpc.thanos.svc.cluster.local
- dnssrv+_grpc._tcp.thanos-query-grpc.thanos.svc.cluster.local
so, it seems that without setting --grpc-client-tls-secure
the secure endpoints won't work, and if I enable it the dnssrv
records won't work :(
as far as I understand there is no way of doing this right now.
Try getambassador.io for a nice GRPC proxy. It's what we ended up using to get around the problem of not being able to mix TLS and non-TLS stores.
@daviddyball I'm considering using ambassador as an edge proxy in each cluster too. Ambassador has tls termination ability, and we can use it to accept tls connection from central monitoring cluster.
But for the central monitoring cluster, is it possible for ambassador to originate tls connection to the remote cluster(egress grpc tls traffic)?
Would you share your config? thanks.
@bwplotka @caarlos0 , I found a blog post https://improbable.io/blog/thanos-architecture-at-improbable saying
In terms of requests to a remote cluster, Envoy has been used securely to proxy our request between many clusters; meaning that a request will go via an Envoy sidecar, an edge Envoy egress proxy, and over the public internet to an edge Envoy ingress proxy (all over a secure connection). This latter will then forward the request onto the Thanos Sidecar to retrieve the data for the time period and labels specified. All of this is done using server streaming gRPC.
Just wondering where I can find envoy config examples like this for ingress and especially egress tls traffic?
Thanks very much Ben
@benjaminhuo I had thanos-query
talking directly to ambassador
directly, so didn't use Ambassador for outbound. You'd probably want to use Envoy directly for that part, no?
@benjaminhuo I had
thanos-query
talking directly toambassador
directly, so didn't use Ambassador for outbound. You'd probably want to use Envoy directly for that part, no?
Yeah, I'm considering using envoy to manage outbound tls.
Then you've the same tls client config for all the store specified in thanos query including the one in the same cluster as thanos query?
We're not validating TLS at the moment @benjaminhuo , so there's no TLS config on query
, just a list of store
endpoints :+1:
We're not validating TLS at the moment @benjaminhuo , so there's no TLS config on
query
, just a list ofstore
endpoints 👍
I see :)
It looks like there are mix of things in this feature requests.
So is it about host header or per store TLS config?
Per-store TLS config. The ability to specify TLS for some stores and disable it for others. We've since worked around the issue, so if it doesn't fit with the intended design of Thanos, then it can be closed as Won't Fix
:+1:
How you worked around it? If you will quickly describe this it would be awesome for users (:
I guess some forward proxy?
We ended up switching off TLS for the time being and using Ambassador as a proxy.
The eventual goal is to use cross-cluster services via Istio, so we can just go direct without proxies.
We ran into this issue with a Thanos in Kubernetes. We have stores in the cluster we wanted to communicate with in plaintext, and others in remote (bare metal) datacenters with which we wanted to use TLS.
We worked around the problem with one Envoy pod per remote datacenter, terminating the TLS connection in the cluster hosting the Thanos query
. Each is tagged like a local store to be automatically picked up by Thanos' dnssrv discovery mechanism.
Still, it would probably be simpler to do away with the proxies and let Thanos speak directly (and securely) with the remote endpoints.
We ran into this issue with a Thanos in Kubernetes. We have stores in the cluster we wanted to communicate with in plaintext, and others in remote (bare metal) datacenters with which we wanted to use TLS.
We worked around the problem with one Envoy pod per remote datacenter, terminating the TLS connection in the cluster hosting the Thanos
query
. Each is tagged like a local store to be automatically picked up by Thanos' dnssrv discovery mechanism.Still, it would probably be simpler to do away with the proxies and let Thanos speak directly (and securely) with the remote endpoints.
@jqueuniet I'm running into the exact same issue. I've been struggling to get a working Envoy configuration in place. Would you be open to sharing your solution here or discussing it privately? Thanks!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Any update? Why this issue become stale? I have the exact same issue...
It became stale as no one said anything for long time (: I marked this thread as pinned
as it's not solved and it's fair feature request.
We recently added way for different TLS for Ruler Query API and alertmanagers, so we might want to follow same logic and allow that for querier. Help wanted (:
This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.
We want this (:
Curious how far along this is? I need to add the endpoints that are in my other clusters and the only way to do this was to enable the grpc-tls flag on the Query deployment. This rendered the local instances unreachable. Sure I can turn these into ingresses themselves, but would be great to just use this functionality.
@bwplotka is there anyone working on this?
if no, I would like to give it a try 🙏
I don't think anyone is on it at the moment. It's not easy though (: We probably need something like this: https://github.com/bwplotka/thanos/blob/3b71c2d59a013198f5eff83143bc7eddc1b7c4f0/cmd/thanos/rule.go#L104 but for gRPC on Querier.
I've got the base of this done: https://github.com/G-Research/thanos/pull/1
I still need to work on e2e tests - happy to redirect the PR here if you're interested.
Have popped some questions on Slack around guidance on where people would like to go with this..
This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.
I attempted to use nginx, but it appears that the grpc call is not sending a host header. Can anyone confirm this is the case? (Or at least anyone with better knowledge of the code and go than myself)
Same issue is happening here, since we are using a nginx with "virtual hosts" to handle traffic in the K8S clusters, sending no host header is a big problem
Any advance or idea about how this could be solved?
Thanks
that is what I saw, the grpc implementation they are using does not send SNI information as near as I can tell.
On Wed, Apr 22, 2020 at 10:35 AM tomasfse notifications@github.com wrote:
I attempted to use nginx, but it appears that the grpc call is not sending a host header. Can anyone confirm this is the case? (Or at least anyone with better knowledge of the code and go than myself)
Same issue is happening here, since we are using a nginx with "virtual hosts" to handle traffic in the K8S clusters, sending no host header is a big problem
Any advance or idea about how this could be solved?
Thanks
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thanos-io/thanos/issues/977#issuecomment-617853775, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWMPIVCZGQMXR5MAMZDE6TRN4FCRANCNFSM4HBOSYOQ .
-- Mike Eklund | DealNews
Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
This would still be incredibly useful as we have had to add additional infrastructure and config in order to force all our endpoints to use SSL even if the API was in the same cluster and would normally use local endpoints.
It would be ideal to be able to set the grpc client config, per store, possibly via conf file, else via multiple flags somehow.
Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
This is required to allow sidecars to share load balancers or reverse proxies or ingress controllers with other services
It would be great to fix it :D
Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
I would like to emphasize the importance of this issue, because fewer balancers means less money in infrastructure
Pinging this issue again... This would be a highly desirable feature.
What happened to the PR that was for it? (: Looks like it got abandoned? :thinking:
This means help wanted still, definitely high on the priority list, thanks for pinging!
@eswdd indicated on March 6th that he was trying to get feedback on PR: https://github.com/G-Research/thanos/pull/1 Had there been any feedback from the Thanos maintainers about whether that approach had their blessing? I'm asking on behalf of anyone who would be interested in picking up where that PR left off...
We definitely missed that. I looked now and it looks like a good direction, but it might need a formal proposal or just more discussion on potential alternatives.
We need to remember that Querier does not only targets StoreAPI targets, but actually also RulesAPI targets and in future much more e.g TargetsAPI etc...
So far we have also multiple options to configure Querier targets:
--store
--store-strict
--store.sd-files
--rule
(hidden)
By introducing yet another we just go into a more ambiguous configuration style. Instead, in order to achieve per target TLS options, we might need a total overhaul of our flags and most likely introduce just a single config file.
--targets.config
(and --targets.config-file
) which will expect YAML config very similar to Prometheus ScrapeConfig but for Querier targets and gRPC not HTTP. There you will be able to configure things like:
tls_config
proxy_url
basic_auth
metadata
file_sd_configs
static_configs
dns_sd_configs
strict
or not.Open question: Do we want to leave older flags like --store
, --store-strict
, --store.sd-files
--rule
(hidden) I think I would remove --store-strict
, --store.sd-files
--rule
so more complex ones, but keep --store
for very basic scenarios and compatibility.
I will take it to our Thanos Community Sync (you are welcome to join! https://thanos.io/v0.13/contributing/community.md/#thanos-community-meeting) and maybe transfer this to formal proposal so everyone is aware of this change.
Initial thoughts? Would that work for everyone? (:
Request
Feature request to add support for per-store tls configuration.
Use Case
I have
store
running alongsidequery
on my main "monitoring" cluster and we don't use TLS for comms between pods on the same cluster. I also want to pull metrics from my remote Prometheus instances which exist on different clusters, but these endpoints are TLS encrypted because they are cross-clusterSuggestion
Change the
--store
argument to use a more definitive style for remote endpoints e.g.:grpc://<ip-address>:<port>
grpc+tls://<ip-address>:<port>
This would allow specifying per-store TLS settings and allow a combination of secure and insecure comms for store endpoints.
Related PR from @adrien-f : https://github.com/improbable-eng/thanos/pull/899