whatyouhide / redix

Fast, pipelined, resilient Redis driver for Elixir. πŸ›
http://hexdocs.pm/redix
MIT License
1.1k stars 122 forks source link

Redix claims no_viable_sentinel_connection but CLI connects #172

Closed tsturzl closed 4 years ago

tsturzl commented 4 years ago

So I've setup a redis cluster with sentinel on k8s using bitnami's helm chart. I tested the cluster by connecting to sentinel with the redis CLI, calling the SENTINEL command to get the primary, and then connecting to the primary and doing some basic write and read operations. This works, the cluster seemingly works flawlessly. The problem comes when I try to connect Redix. I've tried a lot of things. Redix seems to be able to get the primary, as it logs out the proper sentinel primary. And Redix even works if I then try connecting Redix to the primary manually.

For example I have master, slave1, 2, and 3 all on port 6379, and then the setup exposes sentinel through a k8s service(this is kind of like a TCP proxy of sorts) on port 26379. Redix connects to sentinel:26379 just fine, then logs master:6379 as the primary. Then the process crashes and Redix tells me no_viable_sentinel_connection. If I then provide a config to connect to master:6379 as a regular redis node it connects and works just fine.

This is odd and confusing, as there seems to be an issue with connecting to the primary after it's discovered, but really Redix doesn't seem to have any issue connecting the primary if I manually configure it to. So something seems wrong in handing over from sentinel to primary. I'm wondering if perhaps the fact that sentinel is behind a k8s service could cause the issue somehow.

My redis is Redis server v=5.0.8 sha=00000000:0 malloc=jemalloc-5.1.0 bits=64 build=e7d408d535dc93b1. I'm running the latest bitnami helm chart as of writing this on k8s 1.14 using helm 3.1.2

tsturzl commented 4 years ago

As far as I can tell by reading the source Redix is running the same exact command to get the primary as I am when I use the CLI. And then it tries to connect to that host. I'm really lost as to why Redix thinks there is no viable sentinel connection. Further info that may help is I'm running a pool of connections to which I provide the sentinel config to each of them using the name based pooling approach in the docs(https://hexdocs.pm/redix/real-world-usage.html#name-based-pool). I'm not sure if that changes anything.

tsturzl commented 4 years ago

This is how I'm spawning the pool: https://gist.github.com/tsturzl/c8ea57811ea2017699ff608e78772c0b

whatyouhide commented 4 years ago

Hey @tsturzl, thanks for the report!

This is how I'm spawning the pool:

This looks good at a glance, yep.

I'm really lost as to why Redix thinks there is no viable sentinel connection

When Redix complains about this it's because it either can't connect to any of the provided sentinels, or because trying to get a host/port combination from all sentinels fails. One thing you can try is enabling debug logs, since Redix logs a bunch of info when connecting to sentinels.

Other than that, I need more info to effectively try and debug this. I don't use Kubernetes and never did, sorry :) If you can provide minimal reproducible case, ideally with one single connection instead of a pool, it would be ideal. You can try to use docker-compose to reproduce a situation similar to the one that you have with Kubernetes. For now I'll mark this as "needs more info" :)

whatyouhide commented 4 years ago

@tsturzl ping?

whatyouhide commented 4 years ago

Closing this since there's not enough information to reproduce this. If you ever manage to build a minimal reproducing case @tsturzl, feel free to ping here and we'll reopen the issue. Thanks!

tsturzl commented 4 years ago

Sure thing. I've been pulled away from this currently, and will revisit. I did try to make a reproducable docker compose setup, but I believe this is likely caused by the way kubernetes services work. So this is certainly not something I'd expect most people to see in a typical use case. Hopefully I'll get back to this soon.

sagarrohankar-bsft commented 1 year ago

Hi @whatyouhide, am facing the same error while connecting to tile38 sentinel. My config:

tile38_redix_opts = 
[
  sync_connect: true,
  sentinel: [
    timeout: 5000,
    group: "master",
    sentinels: [
      [port: 26379, host: "tile38-sentinel1.domain.vpc"],
      [port: 26379, host: "tile38-sentinel2.domain.vpc"],
      [port: 26379, host: "tile38-sentinel3.domain.vpc"]
    ]
  ]
]

and calling: {:ok, redis} = Redix.start_link(tile38_redix_opts)

getting

Aborted
** (MatchError) no match of right hand side value: {:error, %Redix.ConnectionError{reason: :no_viable_sentinel_connection}}
    (stdlib 4.1.1) erl_eval.erl:496: :erl_eval.expr/6
    #cell:bcyuxxgilwm6jv2tezkuhdee7ooeslfu:1: (file)

If i used single node, its working

{:ok, conn} = Redix.start_link(host: "tile38-eu-sentinel3.prodeu.vpc", port: 26379)
{:ok, #PID<0.20126.0>}

Do you know what could be issue here?

yangou commented 1 year ago

Piggyback on the above, there are some logs

iex> opts = [
...>   sentinel: [
...>     timeout: 5000,
...>     group: "master",
...>     sentinels: [
...>       [port: 26379, host: "tile38-eu-sentinel1.prodeu.vpc"],
...>       [port: 26379, host: "tile38-eu-sentinel2.prodeu.vpc"],
...>       [port: 26379, host: "tile38-eu-sentinel3.prodeu.vpc"]
...>     ]
...>   ],
...>   debug: [:trace]
...> ]
[
  sentinel: [
    timeout: 5000,
    group: "master",
    sentinels: [
      [port: 26379, host: "tile38-eu-sentinel1.prodeu.vpc"],
      [port: 26379, host: "tile38-eu-sentinel2.prodeu.vpc"],
      [port: 26379, host: "tile38-eu-sentinel3.prodeu.vpc"]
    ]
  ],
  debug: [:trace]
]
iex>
nil
iex> {:ok, pid} = Redix.start_link(opts)

{:ok, #PID<0.8886.0>}
*DBG* <10097.8886.0> consume internal init_state in state connecting
*DBG* <10097.8886.0> receive info {stopped,<0.8887.0>,no_viable_sentinel_connection} in state connecting
*DBG* <10097.8886.0> receive internal {notify_of_disconnection,no_viable_sentinel_connection} in state disconnected
*DBG* <10097.8886.0> consume info {stopped,<0.8887.0>,no_viable_sentinel_connection} in state connecting => disconnected
*DBG* <10097.8886.0> start_timer {{timeout,reconnect},500,nil,[]} in state disconnected
*DBG* <10097.8886.0> consume internal {notify_of_disconnection,no_viable_sentinel_connection} in state disconnected
*DBG* <10097.8886.0> receive {timeout,reconnect} nil in state disconnected
*DBG* <10097.8886.0> consume {timeout,reconnect} nil in state disconnected => connecting
*DBG* <10097.8886.0> receive info {stopped,<0.8890.0>,no_viable_sentinel_connection} in state connecting
*DBG* <10097.8886.0> receive internal {notify_of_disconnection,no_viable_sentinel_connection} in state disconnected
*DBG* <10097.8886.0> consume info {stopped,<0.8890.0>,no_viable_sentinel_connection} in state connecting => disconnected
*DBG* <10097.8886.0> start_timer {{timeout,reconnect},750,nil,[]} in state disconnected
*DBG* <10097.8886.0> consume internal {notify_of_disconnection,no_viable_sentinel_connection} in state disconnected
*DBG* <10097.8886.0> receive {timeout,reconnect} nil in state disconnected
*DBG* <10097.8886.0> consume {timeout,reconnect} nil in state disconnected => connecting
*DBG* <10097.8886.0> receive info {stopped,<0.8891.0>,no_viable_sentinel_connection} in state connecting
*DBG* <10097.8886.0> receive internal {notify_of_disconnection,no_viable_sentinel_connection} in state disconnected
*DBG* <10097.8886.0> consume info {stopped,<0.8891.0>,no_viable_sentinel_connection} in state connecting => disconnected
*DBG* <10097.8886.0> start_timer {{timeout,reconnect},1125,nil,[]} in state disconnected
*DBG* <10097.8886.0> consume internal {notify_of_disconnection,no_viable_sentinel_connection} in state disconnected
iex(fireflyx@4a6609046fec)17>
nil
*DBG* <10097.8886.0> receive {timeout,reconnect} nil in state disconnected
*DBG* <10097.8886.0> consume {timeout,reconnect} nil in state disconnected => connecting
*DBG* <10097.8886.0> receive info {stopped,<0.8894.0>,no_viable_sentinel_connection} in state connecting
*DBG* <10097.8886.0> receive internal {notify_of_disconnection,no_viable_sentinel_connection} in state disconnected
*DBG* <10097.8886.0> consume info {stopped,<0.8894.0>,no_viable_sentinel_connection} in state connecting => disconnected
*DBG* <10097.8886.0> start_timer {{timeout,reconnect},1688,nil,[]} in state disconnected
*DBG* <10097.8886.0> consume internal {notify_of_disconnection,no_viable_sentinel_connection} in state disconnected
*DBG* <10097.8886.0> receive {timeout,reconnect} nil in state disconnected
*DBG* <10097.8886.0> consume {timeout,reconnect} nil in state disconnected => connecting
*DBG* <10097.8886.0> receive info {stopped,<0.8895.0>,no_viable_sentinel_connection} in state connecting
*DBG* <10097.8886.0> receive internal {notify_of_disconnection,no_viable_sentinel_connection} in state disconnected
*DBG* <10097.8886.0> consume info {stopped,<0.8895.0>,no_viable_sentinel_connection} in state connecting => disconnected
*DBG* <10097.8886.0> start_timer {{timeout,reconnect},2532,nil,[]} in state disconnected
*DBG* <10097.8886.0> consume internal {notify_of_disconnection,no_viable_sentinel_connection} in state disconnected```
whatyouhide commented 1 year ago

Hey @sagarrohankar-bsft and @yangou πŸ‘‹

If i used single node, its working

{:ok, conn} = Redix.start_link(host: "tile38-eu-sentinel3.prodeu.vpc", port: 26379)
{:ok, #PID<0.20126.0>}

The URL you're using here is different though. I suspect you just "redacted" the ones you showed in the sentinel config, but worth checking πŸ˜‰

sagarrohankar-bsft commented 1 year ago

Yes, the one I used is redacted one, but we are referring to same sentinel urls.

whatyouhide commented 1 year ago

Can you use Redix.Telemetry.attach_default_handler/0 and paste logs here?

yangou commented 1 year ago

@whatyouhide I think we just found the issue. Our redis version was too old that doesn't support the ROLE command. It has nothing wrong with the Redix client.

whatyouhide commented 1 year ago

Ah, awesome! Got it. Did you get any errors on the client side? That way we could show up a nice error with instructions on what to do.