Open vaillani opened 1 year ago
The Redis cache configurations accepts the CA file as configuration, see the official docs: https://thanos.io/tip/components/store.md/#redis-index-cache
Thanks for the aswer, I have already tried this, with ca-certificates lib there are 4 Amazon Root certificates, I merged them into one unique file and add the path into tls_config :
--query-range.response-cache-config=
config:
addr: XXX.cache.amazonaws.com:6379
tls_enabled: true
tls_config:
ca_file: /etc/ssl/certs/ca-cert-Amazon_Root_CA.pem
type: "redis"
I got the same issue: creating redis client: context deadline exceeded
You can add use the insecure option to skip the cert check. 👀
Otherwise unfortunately I can't help anymore, I don't have this kind of setup.
I tried also to skip the option : insecure_skip_verify: true
it doesn't seem to have any impact on my issue
Did you try already without in transit encryption? This seems weird... it's like a timeout somewhere
Yes for the moment we use ElasticCache Redis without TLS encryption it works well
I'm also experiencing this issue. I have redis configured with TLS, but query-frontend cannot connect to it.
I dug into the code a bit, and from what I can tell I don't think TLS has been implemented yet? https://github.com/thanos-io/thanos/blob/main/pkg/queryfrontend/config.go#L162
The NewCacheConfig parser for Redis doesn't seem to pass any TLS options over to the cortex cache config, so it doesn't get enabled.
Just doing a quick test, made the following change:
diff --git a/pkg/queryfrontend/config.go b/pkg/queryfrontend/config.go
index a5655199..80e7f3f0 100644
--- a/pkg/queryfrontend/config.go
+++ b/pkg/queryfrontend/config.go
@@ -166,6 +166,8 @@ func NewCacheConfig(logger log.Logger, confContentYaml []byte) (*cortexcache.Con
Expiration: config.Expiration,
DB: config.Redis.DB,
Password: flagext.Secret{Value: config.Redis.Password},
+ EnableTLS: true,
+ InsecureSkipVerify: true,
},
Background: cortexcache.BackgroundConfig{
WriteBackBuffer: config.Redis.MaxSetMultiConcurrency * config.Redis.SetMultiBatchSize,
recompiled, and query-frontend is able to connect to my Redis using TLS.
Unfortunately my Golang skills are quire limited so I don't know how to fix this properly.
Would this then categorize as a bug since the insecure_skip_verify
and tls_enabled
is not passed down during convert to the cortexcache.RedisConfig
?
Would this then categorize as a bug since the
insecure_skip_verify
andtls_enabled
is not passed down during convert to thecortexcache.RedisConfig
?
I guess so. Any workaround? Having the same issue on thanos store index cache, when applying this configuration:
indexCacheConfig: |
addr: master.raas-thanos-storegateway.xxx.xxx.cache.amazonaws.com:6379 # Redis address and port
db: 0 # Redis database (default: 0)
dial_timeout: 20s # Dial timeout (default: 5s)
read_timeout: 20s # Read timeout (default: 3s)
write_timeout: 20s # Write timeout (default: 3s)
# Concurrency and batch settings
max_get_multi_concurrency: 50 # Max concurrent GET operations (default: 50)
get_multi_batch_size: 50 # Batch size for GET operations (default: 50)
max_set_multi_concurrency: 50 # Max concurrent SET operations (default: 50)
set_multi_batch_size: 50 # Batch size for SET operations (default: 50)
# Cache and async settings
cache_size: 128MB # Cache size (default: 128MB)
max_async_buffer_size: 5000 # Async buffer size (default: 5000)
max_async_concurrency: 20 # Async concurrency (default: 20)
# Circuit breaker settings
set_async_circuit_breaker_config:
enabled: true # Circuit breaker enabled (default: true)
half_open_max_requests: 10 # Max requests during half-open state (default: 10)
open_duration: 5s # Circuit breaker open duration (default: 5s)
min_requests: 50 # Min requests before tracking failures (default: 50)
consecutive_failures: 5 # Consecutive failures to open circuit breaker (default: 5)
failure_percent: 0.05 # Failure percentage to open circuit breaker (default: 0.05 or 5%)
# Caching specific items (empty by default)
enabled_items: [""] # Default: empty (all items cached by default). Possible values: Postings, Series, ExpandedPostings.
# Time-to-live (TTL) for cached items
ttl: 30m # TTL for cached index items (default: 30 minutes)
I use bitnami helm chart.
Connection issue when trying to connect Thanos QueryFrontend to an AWS ElasticCache Redis with TLS enabled.
Thanos, Prometheus and Golang version used:
Thanos v0.32.2 with AWS ElasticCache Redis 6.2.6 with Encryption in transit enabled
This is the configuration used:
Object Storage Provider: AWS
What happened:
When I tried to connect to AWS ElasticCache Redis cluster with TLS in transit, I got a connection issue:
context deadline exceeded
.I think it is because of missing root certificates because when I used a alpine image and install the root certificates which include Amazon_Root_CA it worked well.
I tried to add those certificates with an initContainer but I got the same connection issue.
What you expected to happen:
Connect successfully Thanos QueryFrontend to ElasticCache Redis cluster with TLS.
Full logs to relevant components: