thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.06k stars 2.09k forks source link

thanos continuously restarts (with SIGTERM) when TLS config / basic auth config defined using --http.config #6361

Open hgsat123 opened 1 year ago

hgsat123 commented 1 year ago

Thanos, Prometheus and Golang version used: Thanos : 0.30.0 Prometheus: 2.35 Golang : GoVersion:"go1.16.2", Compiler:"gc", Platform:"darwin/amd64"}

Object Storage Provider: local - bithub

What happened: I'm trying to setup Thanos Receiver as remote write endpoint for local & cloud enabled prometheus. As it requires secure access either local or cloud based prometheus to access thanos receiver deployed at on premise k8s environment running on private cloud. I'm able to test remote-write endpoint of thanos receiver without TLS or basic authentication. To test TLS./ basic auth, i created client specific certs / ca using tool & created a configmap using below template & mounted configmap on thanos-receiver pod to pass this file (config.yaml) to http.config

apiVersion: v1 data: config.yaml: | tls_server_config: cert_file: /certs/server.pem key_file: /certs/server.key client_auth_type: RequireAndVerifyClientCert client_ca_file: /certs/client-ca.pem basic_auth_users: satish:

kind: ConfigMap metadata: name: thanos-http-conf namespace:

What you expected to happen: thanos should successfully validate these TLS certs during start-up & successfully start the receiver service. instead, it keep restarting due to signal received (termination as it would do kill -9 on thanos)

How to reproduce it (as minimally and precisely as possible): Just define a TLS config or basic_auth_users config as part of thanos-http-config & attempt to start thanos. It fails always with signal received. I tried keeping only basic_auth_users entry with only me as user & sample "hello" (bcyrpt passwd) & passed the config file to http.config. The result was same,

Full logs to relevant components: thanos-receiver pod logs below

evel=debug ts=2023-05-06T06:06:26.27502507Z caller=receive.go:637 component=receive component=uploader msg="upload phase done" uploaded=0 elapsed=1.57144ms level=debug ts=2023-05-06T06:06:56.272225317Z caller=receive.go:629 component=receive component=uploader msg="upload phase starting" level=debug ts=2023-05-06T06:06:56.272862925Z caller=multitsdb.go:379 component=receive component=multi-tsdb msg="uploading block for tenant" tenant=default-tenant level=debug ts=2023-05-06T06:06:56.274729099Z caller=receive.go:637 component=receive component=uploader msg="upload phase done" uploaded=0 elapsed=1.876251ms level=info ts=2023-05-06T06:07:25.482602924Z caller=main.go:172 msg="caught signal. Exiting." signal=terminated level=warn ts=2023-05-06T06:07:25.483264688Z caller=intrumentation.go:67 component=receive msg="changing probe status" status=not-ready reason=null level=info ts=2023-05-06T06:07:25.483326322Z caller=http.go:91 component=receive service=http/server component=receive msg="internal server is shutting down" err=null level=info ts=2023-05-06T06:07:25.483430871Z caller=receive.go:566 component=receive msg="shutting down storage" level=info ts=2023-05-06T06:07:25.483452317Z caller=multitsdb.go:231 component=receive component=multi-tsdb msg="flushing TSDB" tenant=default-tenant level=info ts=2023-05-06T06:07:25.483853734Z caller=http.go:110 component=receive service=http/server component=receive msg="internal server is shutdown gracefully" err=null level=info ts=2023-05-06T06:07:25.483918841Z caller=intrumentation.go:81 component=receive msg="changing probe status" status=not-healthy reason=null

Anything else we need to know: thanos v0.30.0 from thanosio/thanos copied to oracle linux image (client specific) & pushed to local docker registry (client specific). Cannot share these details.

hgsat123 commented 1 year ago

Please note, my thanos-receiver has following args passed

  affinity: {}
  containers:
  - args:
    - $(CMD_TYPE)
    - --log.format=logfmt
    - --log.level=debug
    - --grpc-address=0.0.0.0:10901
    - --http-address=0.0.0.0:10902
    - --remote-write.address=0.0.0.0:19291
    - --receive.replication-factor=1
    - --tsdb.path=/var/thanos/receive
    - --receive.local-endpoint=${NAME}.thanos-receive-default.${NAMESPACE}.svc.cluster.local:10901
    - --label=receive_replica="${NAME}"
    - --label=receive="true"
    - --tsdb.retention=1d
    - --http.config=/conf/config.yaml

This config.yaml is referenced from configmap which is mounted on /conf directory on thanos-receiver pod. These certificates (TLS) are created as kubernetes secrets & mounted on /certs directory on thanos-receiver. so, same certs (.pem & .key) are referenced fro /certs path in http_config configmap