thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.1k stars 2.1k forks source link

Redis client can't connect to server #6018

Open Supporterino opened 1 year ago

Supporterino commented 1 year ago

Discussed in https://github.com/thanos-io/thanos/discussions/6013

Originally posted by **Supporterino** January 3, 2023 Hello guys, I am just updating my thanos stack to `v0.30.0` and want to switch over to redis as the cache provider. I set up a redis cluster on version v7 with the bitnami helm chart. I am using the following cache configuration (as example query-range cache): ```yaml config: addr: "redis-redis-cluster-0.redis-redis-cluster-headless:6379,redis-redis-cluster-1.redis-redis-cluster-headless:6379,redis-redis-cluster-2.redis-redis-cluster-headless:6379" password: "SECURE-PASSWORD" db: 0 dial_timeout: 5s read_timeout: 3s write_timeout: 3s pool_size: 100 min_idle_conns: 10 idle_timeout: 5m0s max_conn_age: 0s max_get_multi_concurrency: 100 get_multi_batch_size: 100 max_set_multi_concurrency: 100 set_multi_batch_size: 100 tls_enabled: false cache_size: 1GiB type: "REDIS" ``` But my redis instance isn't getting any load and the query frontend just logs the following: ```level=error ts=2023-01-03T10:41:45.063055745Z caller=redis_cache.go:46 msg="error connecting to redis" name=redis err="got 4 elements in cluster info address, expected 2 or 3" level=error ts=2023-01-03T10:41:45.346637454Z caller=redis_cache.go:46 msg="error connecting to redis" name=redis err="got 4 elements in cluster info address, expected 2 or 3" level=info ts=2023-01-03T10:41:45.347196342Z caller=query_frontend.go:339 msg="starting query frontend" level=info ts=2023-01-03T10:41:45.347215738Z caller=intrumentation.go:56 msg="changing probe status" status=ready level=info ts=2023-01-03T10:41:45.347303382Z caller=intrumentation.go:75 msg="changing probe status" status=healthy level=info ts=2023-01-03T10:41:45.34734474Z caller=http.go:73 service=http/server component=query-frontend msg="listening for requests and metrics" address=0.0.0.0:9090 level=info ts=2023-01-03T10:41:45.347625667Z caller=tls_config.go:232 service=http/server component=query-frontend msg="Listening on" address=[::]:9090 level=info ts=2023-01-03T10:41:45.347645684Z caller=tls_config.go:235 service=http/server component=query-frontend msg="TLS is disabled." http2=false address=[::]:9090 level=error ts=2023-01-03T10:45:24.948875993Z caller=redis_cache.go:75 msg="failed to get from redis" name=redis err="got 4 elements in cluster info address, expected 2 or 3" level=error ts=2023-01-03T10:45:25.407642957Z caller=redis_cache.go:103 msg="failed to put to redis" name=redis err="got 4 elements in cluster info address, expected 2 or 3" level=error ts=2023-01-03T10:45:25.426421268Z caller=redis_cache.go:75 msg="failed to get from redis" name=redis err="got 4 elements in cluster info address, expected 2 or 3" level=error ts=2023-01-03T10:45:25.519582802Z caller=redis_cache.go:75 msg="failed to get from redis" name=redis err="got 4 elements in cluster info address, expected 2 or 3" level=error ts=2023-01-03T10:45:25.612233941Z caller=redis_cache.go:75 msg="failed to get from redis" name=redis err="got 4 elements in cluster info address, expected 2 or 3" level=error ts=2023-01-03T10:45:25.709052914Z caller=redis_cache.go:75 msg="failed to get from redis" name=redis err="got 4 elements in cluster info address, expected 2 or 3" ``` What exactly am I missing ?
yeya24 commented 1 year ago

Answered this in that discussion post. It is related to https://github.com/go-redis/redis/issues/2085. For redis version v7 we should use go-redis version v9 instead while we are using go-redis v8.

For this usecase we should upgrade go-redis library. But it looks like it is not backward compatible so if we upgrade it it breaks redis 6.x.

Supporterino commented 1 year ago

Temporarly I downgraded your redis cluister to 6.2.8 since it is only used for thanos. Now it is working like a charm. Ty for your help. It might be useful to make a little note at the redis cache section maybe

kforsthoevel commented 1 year ago

Why does the Store Gateway works w/ Redis 7 and the Query Frontend does not?

Schmitze333 commented 1 year ago

Would it be an option to use this Redis client (https://github.com/rueian/rueidis) via the cacheutils internal package?

yeya24 commented 1 year ago

Yeah it would be great to use the same rueidis client in query frontend redis cache as well.

Schmitze333 commented 1 year ago

@yeya24 I'm working on a PR targeting the use of rueidis also as Redis client in the query-frontend, but something puzzles me with regard to the Redis configs. I wonder whether this issue is the right place to discuss or rather WIP PR.

michalschott commented 1 year ago

Hi,

Recently tried to enable redis cache for query-frontend component - it failed with this error:

{"caller":"redis_cache.go:75","err":"ERR unknown command 'select', with args beginning with: '1' ","level":"error","msg":"failed to get from redis","name":"redis","ts":"2023-06-07T15:58:53.129616978Z"}

@douglascamata suggested there might be incompatibility between client and server, so I tried these redis versions but none of them succeeded (same error):

I was unable to test with <6.0 because to operator I'm using to deploy redis to k8s is not supporting such old versions ;)

Thanos 0.31.0

douglascamata commented 1 year ago

After some back and forth with @michalschott in Slack, he found out that most of his problems come from using a Redis Cluster for HA.

So for anyone out there using Redis Cluster: you have to leave the DB unset, otherwise it'll fail with an error like so: "ERR SELECT is not allowed in cluster mode", which comes from the DB selection command.

dschaaff commented 1 year ago

I get errors using a v6 redis cluster with query frontend even with the db unset. Example errors

msg="failed to get from redis" name=redis err="MOVED 10784 10.0.200.62:6379"
msg="failed to put to redis" name=redis err="EXECABORT Transaction discarded because of previous errors."

This occurs when pointing query frontend at the same AWS Elasticache cluster I use for the store component. Happy to open a separate issue if needed.

douglascamata commented 1 year ago

Hey folks, can you try again after https://github.com/thanos-io/thanos/pull/6520 got merged? Should be fixed, I believe.

calvinbui commented 7 months ago

Hey folks, can you try again after #6520 got merged? Should be fixed, I believe.

not working for me with exact same config for store gateway