Closed tsturzl closed 4 years ago
As far as I can tell by reading the source Redix is running the same exact command to get the primary as I am when I use the CLI. And then it tries to connect to that host. I'm really lost as to why Redix thinks there is no viable sentinel connection. Further info that may help is I'm running a pool of connections to which I provide the sentinel config to each of them using the name based pooling approach in the docs(https://hexdocs.pm/redix/real-world-usage.html#name-based-pool). I'm not sure if that changes anything.
This is how I'm spawning the pool: https://gist.github.com/tsturzl/c8ea57811ea2017699ff608e78772c0b
Hey @tsturzl, thanks for the report!
This is how I'm spawning the pool:
This looks good at a glance, yep.
I'm really lost as to why Redix thinks there is no viable sentinel connection
When Redix complains about this it's because it either can't connect to any of the provided sentinels, or because trying to get a host/port combination from all sentinels fails. One thing you can try is enabling debug logs, since Redix logs a bunch of info when connecting to sentinels.
Other than that, I need more info to effectively try and debug this. I don't use Kubernetes and never did, sorry :) If you can provide minimal reproducible case, ideally with one single connection instead of a pool, it would be ideal. You can try to use docker-compose to reproduce a situation similar to the one that you have with Kubernetes. For now I'll mark this as "needs more info" :)
@tsturzl ping?
Closing this since there's not enough information to reproduce this. If you ever manage to build a minimal reproducing case @tsturzl, feel free to ping here and we'll reopen the issue. Thanks!
Sure thing. I've been pulled away from this currently, and will revisit. I did try to make a reproducable docker compose setup, but I believe this is likely caused by the way kubernetes services work. So this is certainly not something I'd expect most people to see in a typical use case. Hopefully I'll get back to this soon.
Hi @whatyouhide, am facing the same error while connecting to tile38 sentinel. My config:
tile38_redix_opts =
[
sync_connect: true,
sentinel: [
timeout: 5000,
group: "master",
sentinels: [
[port: 26379, host: "tile38-sentinel1.domain.vpc"],
[port: 26379, host: "tile38-sentinel2.domain.vpc"],
[port: 26379, host: "tile38-sentinel3.domain.vpc"]
]
]
]
and calling:
{:ok, redis} = Redix.start_link(tile38_redix_opts)
getting
Aborted
** (MatchError) no match of right hand side value: {:error, %Redix.ConnectionError{reason: :no_viable_sentinel_connection}}
(stdlib 4.1.1) erl_eval.erl:496: :erl_eval.expr/6
#cell:bcyuxxgilwm6jv2tezkuhdee7ooeslfu:1: (file)
If i used single node, its working
{:ok, conn} = Redix.start_link(host: "tile38-eu-sentinel3.prodeu.vpc", port: 26379)
{:ok, #PID<0.20126.0>}
Do you know what could be issue here?
Piggyback on the above, there are some logs
iex> opts = [
...> sentinel: [
...> timeout: 5000,
...> group: "master",
...> sentinels: [
...> [port: 26379, host: "tile38-eu-sentinel1.prodeu.vpc"],
...> [port: 26379, host: "tile38-eu-sentinel2.prodeu.vpc"],
...> [port: 26379, host: "tile38-eu-sentinel3.prodeu.vpc"]
...> ]
...> ],
...> debug: [:trace]
...> ]
[
sentinel: [
timeout: 5000,
group: "master",
sentinels: [
[port: 26379, host: "tile38-eu-sentinel1.prodeu.vpc"],
[port: 26379, host: "tile38-eu-sentinel2.prodeu.vpc"],
[port: 26379, host: "tile38-eu-sentinel3.prodeu.vpc"]
]
],
debug: [:trace]
]
iex>
nil
iex> {:ok, pid} = Redix.start_link(opts)
{:ok, #PID<0.8886.0>}
*DBG* <10097.8886.0> consume internal init_state in state connecting
*DBG* <10097.8886.0> receive info {stopped,<0.8887.0>,no_viable_sentinel_connection} in state connecting
*DBG* <10097.8886.0> receive internal {notify_of_disconnection,no_viable_sentinel_connection} in state disconnected
*DBG* <10097.8886.0> consume info {stopped,<0.8887.0>,no_viable_sentinel_connection} in state connecting => disconnected
*DBG* <10097.8886.0> start_timer {{timeout,reconnect},500,nil,[]} in state disconnected
*DBG* <10097.8886.0> consume internal {notify_of_disconnection,no_viable_sentinel_connection} in state disconnected
*DBG* <10097.8886.0> receive {timeout,reconnect} nil in state disconnected
*DBG* <10097.8886.0> consume {timeout,reconnect} nil in state disconnected => connecting
*DBG* <10097.8886.0> receive info {stopped,<0.8890.0>,no_viable_sentinel_connection} in state connecting
*DBG* <10097.8886.0> receive internal {notify_of_disconnection,no_viable_sentinel_connection} in state disconnected
*DBG* <10097.8886.0> consume info {stopped,<0.8890.0>,no_viable_sentinel_connection} in state connecting => disconnected
*DBG* <10097.8886.0> start_timer {{timeout,reconnect},750,nil,[]} in state disconnected
*DBG* <10097.8886.0> consume internal {notify_of_disconnection,no_viable_sentinel_connection} in state disconnected
*DBG* <10097.8886.0> receive {timeout,reconnect} nil in state disconnected
*DBG* <10097.8886.0> consume {timeout,reconnect} nil in state disconnected => connecting
*DBG* <10097.8886.0> receive info {stopped,<0.8891.0>,no_viable_sentinel_connection} in state connecting
*DBG* <10097.8886.0> receive internal {notify_of_disconnection,no_viable_sentinel_connection} in state disconnected
*DBG* <10097.8886.0> consume info {stopped,<0.8891.0>,no_viable_sentinel_connection} in state connecting => disconnected
*DBG* <10097.8886.0> start_timer {{timeout,reconnect},1125,nil,[]} in state disconnected
*DBG* <10097.8886.0> consume internal {notify_of_disconnection,no_viable_sentinel_connection} in state disconnected
iex(fireflyx@4a6609046fec)17>
nil
*DBG* <10097.8886.0> receive {timeout,reconnect} nil in state disconnected
*DBG* <10097.8886.0> consume {timeout,reconnect} nil in state disconnected => connecting
*DBG* <10097.8886.0> receive info {stopped,<0.8894.0>,no_viable_sentinel_connection} in state connecting
*DBG* <10097.8886.0> receive internal {notify_of_disconnection,no_viable_sentinel_connection} in state disconnected
*DBG* <10097.8886.0> consume info {stopped,<0.8894.0>,no_viable_sentinel_connection} in state connecting => disconnected
*DBG* <10097.8886.0> start_timer {{timeout,reconnect},1688,nil,[]} in state disconnected
*DBG* <10097.8886.0> consume internal {notify_of_disconnection,no_viable_sentinel_connection} in state disconnected
*DBG* <10097.8886.0> receive {timeout,reconnect} nil in state disconnected
*DBG* <10097.8886.0> consume {timeout,reconnect} nil in state disconnected => connecting
*DBG* <10097.8886.0> receive info {stopped,<0.8895.0>,no_viable_sentinel_connection} in state connecting
*DBG* <10097.8886.0> receive internal {notify_of_disconnection,no_viable_sentinel_connection} in state disconnected
*DBG* <10097.8886.0> consume info {stopped,<0.8895.0>,no_viable_sentinel_connection} in state connecting => disconnected
*DBG* <10097.8886.0> start_timer {{timeout,reconnect},2532,nil,[]} in state disconnected
*DBG* <10097.8886.0> consume internal {notify_of_disconnection,no_viable_sentinel_connection} in state disconnected```
Hey @sagarrohankar-bsft and @yangou π
If i used single node, its working
{:ok, conn} = Redix.start_link(host: "tile38-eu-sentinel3.prodeu.vpc", port: 26379) {:ok, #PID<0.20126.0>}
The URL you're using here is different though. I suspect you just "redacted" the ones you showed in the sentinel config, but worth checking π
Yes, the one I used is redacted one, but we are referring to same sentinel urls.
Can you use Redix.Telemetry.attach_default_handler/0
and paste logs here?
@whatyouhide I think we just found the issue. Our redis version was too old that doesn't support the ROLE
command. It has nothing wrong with the Redix client.
Ah, awesome! Got it. Did you get any errors on the client side? That way we could show up a nice error with instructions on what to do.
So I've setup a redis cluster with sentinel on k8s using bitnami's helm chart. I tested the cluster by connecting to sentinel with the redis CLI, calling the
SENTINEL
command to get the primary, and then connecting to the primary and doing some basic write and read operations. This works, the cluster seemingly works flawlessly. The problem comes when I try to connect Redix. I've tried a lot of things. Redix seems to be able to get the primary, as it logs out the proper sentinel primary. And Redix even works if I then try connecting Redix to the primary manually.For example I have master, slave1, 2, and 3 all on port 6379, and then the setup exposes sentinel through a k8s service(this is kind of like a TCP proxy of sorts) on port 26379. Redix connects to sentinel:26379 just fine, then logs master:6379 as the primary. Then the process crashes and Redix tells me
no_viable_sentinel_connection
. If I then provide a config to connect to master:6379 as a regular redis node it connects and works just fine.This is odd and confusing, as there seems to be an issue with connecting to the primary after it's discovered, but really Redix doesn't seem to have any issue connecting the primary if I manually configure it to. So something seems wrong in handing over from sentinel to primary. I'm wondering if perhaps the fact that sentinel is behind a k8s service could cause the issue somehow.
My redis is
Redis server v=5.0.8 sha=00000000:0 malloc=jemalloc-5.1.0 bits=64 build=e7d408d535dc93b1
. I'm running the latest bitnami helm chart as of writing this on k8s 1.14 using helm 3.1.2