juicedata / juicefs-csi-driver

JuiceFS CSI Driver
https://github.com/juicedata/juicefs
Apache License 2.0
218 stars 82 forks source link

Redis Sentinel Configuration #1126

Open deanpatel2 opened 3 weeks ago

deanpatel2 commented 3 weeks ago

What happened:

I have Redis deployed in Sentinel mode (3 replicas) in a Kubernetes cluster. I am trying to get it configured to work with JuiceFS as the meta URL.

What you expected to happen:

I read the docs on this in Redis Best Practices. It states that the URL should be formatted like this:

redis[s]://[[USER]:PASSWORD@]MASTER_NAME,SENTINEL_ADDR[,SENTINEL_ADDR]:SENTINEL_PORT[/DB]

and I am passing the SENTINEL_PASSWORD as an environment variable. My master set name is redis-sentinel-master.

Given that Redis is deployed in K8s, there are no static IPs, they are load balanced behind a service. The services are:

redis-sentinel                            ClusterIP   172.20.206.57    <none>        6379/TCP,26379/TCP   18h
redis-sentinel-headless                   ClusterIP   None             <none>        6379/TCP,26379/TCP   18h
redis-sentinel-metrics                    ClusterIP   172.20.82.228    <none>        9121/TCP             18h

So I formatted the SENTINEL_ADDR fields with both redis-sentinel and redis-sentinel-headless. For example with redis-sentinel:

redis://:****@redis-sentinel-master,redis-sentinel-node-0.redis-sentinel.<namespace>.svc.cluster.local,redis-sentinel-node-1.redis-sentinel.<namespace>.svc.cluster.local,redis-sentinel-node-2.redis-sentinel.<namespace>.svc.cluster.local:26379/2

Both did not work and I at least one of them to work. Instead, I got these logs on the juicefs-csi-controller pod:

2024/09/05 19:30:22.295329 juicefs[18] <INFO>: Meta address: redis://:****@redis-sentinel-master,redis-sentinel-node-0.redis-sentinel.<namespace>.svc.cluster.local,redis-sentinel-node-1.redis-sentinel.<namespace>.svc.cluster.local,redis-sentinel-node-2.redis-sentinel.<namespace>.svc.cluster.local:26379/2 [interface.go:497]
redis: 2024/09/05 19:30:22 sentinel.go:537: sentinel: GetMasterAddrByName master="redis-sentinel-master" failed: NOAUTH HELLO must be called with the client already authenticated, otherwise the HELLO <proto> AUTH <user> <pass> option can be used to authenticate the client and select the RESP protocol version at the same time
2024/09/05 19:30:22.308544 juicefs[18] <WARNING>: parse info: redis: all sentinels specified in configuration are unreachable [redis.go:3575]
redis: 2024/09/05 19:30:22 sentinel.go:537: sentinel: GetMasterAddrByName master="redis-sentinel-master" failed: NOAUTH HELLO must be called with the client already authenticated, otherwise the HELLO <proto> AUTH <user> <pass> option can be used to authenticate the client and select the RESP protocol version at the same time
2024/09/05 19:30:22.316619 juicefs[18] <FATAL>: redis: all sentinels specified in configuration are unreachable [main.go:31]

It seems clearly auth-related, as I was able to configure the sentinel connection correctly by disabling auth. However I am passing the password as the Docs say. So I am not sure what I am doing wrong. Am I passing the password incorrectly? Does it need more than SENTINEL_PASSWORD, REDIS_PASSWORD, META_PASSWORD?

How to reproduce it (as minimally and precisely as possible):

Deploy Redis in Sentinel mode to Kubernetes and deploy JuiceFS with meta URL pointing to the Sentinel.

Anything else we need to know?

Environment:

deanpatel2 commented 2 weeks ago

Should I be adding the environment variables in this section of the juicefs-secret instead of on controller.envs?

https://juicefs.com/docs/csi/guide/pv/#cloud-service

zhijian-pro commented 2 weeks ago

The current environment is rather complex, can redis sentinel be tested in a non-K8S environment? If there is no problem, it is proved that the environment variable passed by k8s or the redis configuration is incorrect. If the same error occurs, it could be our bug.

deanpatel2 commented 2 weeks ago

@zhijian-pro Unfortunately it is very difficult for us to test in a non-K8s environment as all our deployments are managed in K8s clusters.

Regarding my most recent comment, I did add the environment variables in the envs part of the secret that gets passed to the csi.storage.k8s.io/node-publish-secret-name field. Followed these docs. Specifically I am creating a StorageClass like this:

---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: juicefs-sc
mountOptions:
- writeback
parameters:
  csi.storage.k8s.io/node-publish-secret-name: juicefs-secret
  csi.storage.k8s.io/node-publish-secret-namespace: ****
  csi.storage.k8s.io/provisioner-secret-name: juicefs-secret
  csi.storage.k8s.io/provisioner-secret-namespace: ****
  juicefs/clean-cache: "true"
  juicefs/mount-cpu-limit: "1"
  juicefs/mount-cpu-request: "1"
  juicefs/mount-memory-limit: 1Gi
  juicefs/mount-memory-request: 1Gi
provisioner: csi.juicefs.com
reclaimPolicy: Delete
volumeBindingMode: Immediate

where juicefs-secret has the fields name, access-key, secret-key, bucket, storage, metaurl, and envs.

My metaurl looks like this: redis://@redis-sentinel,redis-sentinel.conductor.svc.cluster.local:26379/2. This has been deployed to K8s using the bitnami redis chart.

The envs part of the secret config above looks like this now, with the environment variables for authentication: '{"AWS_REGION": "****", "SENTINEL_PASSWORD": "****", "META_PASSWORD": "****"}'

Since passing the SENTINEL_PASSWORD and META_PASSWORD like this, I am getting what I believe to be healthy logs from the controller pod:

I0911 13:55:55.469456       7 main.go:94] Run CSI controller
I0911 13:55:55.504867       7 driver.go:50] Driver: csi.juicefs.com version v0.23.5 commit eea17bf17327cc9110c1a5729942058ce58d13ce date 2024-03-08T08:12:47Z
I0911 13:55:55.556920       7 driver.go:115] Listening for connection on address: &net.UnixAddr{Name:"/var/lib/csi/sockets/pluginproxy/csi.sock", Net:"unix"}
I0911 13:55:55.558864       7 leaderelection.go:248] attempting to acquire leader lease conductor/csi.juicefs.com...
I0911 13:55:56.063959       7 mount_manager.go:114] Mount manager started.
I0911 13:55:56.064228       7 leaderelection.go:248] attempting to acquire leader lease conductor/mount.juicefs.com...
I0911 13:56:12.094890       7 leaderelection.go:258] successfully acquired lease conductor/csi.juicefs.com
I0911 13:56:12.095016       7 controller.go:835] Starting provisioner controller csi.juicefs.com_juicefs-csi-controller-0_7107eb11-f30d-48c0-8c98-02796ea8537e!
I0911 13:56:12.095052       7 event.go:282] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"conductor", Name:"csi.juicefs.com", UID:"4d533303-55fb-4c46-b938-4b172be70532", APIVersion:"v1", ResourceVersion:"257540012", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' juicefs-csi-controller-0_7107eb11-f30d-48c0-8c98-02796ea8537e became leader
I0911 13:56:12.196152       7 controller.go:884] Started provisioner controller csi.juicefs.com_juicefs-csi-controller-0_7107eb11-f30d-48c0-8c98-02796ea8537e!
I0911 13:56:12.196355       7 controller.go:1472] delete "pvc-7e66e53a-3c17-42ea-b97c-abff4b9504fb": started
I0911 13:56:12.196658       7 controller.go:1332] provision "conductor/juicefs-pvc" class "juicefs-sc": started
I0911 13:56:12.196874       7 controller.go:1472] delete "pvc-c36b80c2-95ba-48eb-b26c-4da2cf09bd42": started
I0911 13:56:12.197195       7 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"conductor", Name:"juicefs-pvc", UID:"88ba0426-7908-439d-8dda-b16507a28c89", APIVersion:"v1", ResourceVersion:"257539610", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "conductor/juicefs-pvc"
I0911 13:56:12.201627       7 controller.go:1439] provision "conductor/juicefs-pvc" class "juicefs-sc": volume "pvc-88ba0426-7908-439d-8dda-b16507a28c89" provisioned
I0911 13:56:12.201738       7 controller.go:1456] provision "conductor/juicefs-pvc" class "juicefs-sc": succeeded

I was then expecting to be able to mount a volume to a separate pod to actually use juicefs. However, when trying to create and mount a PersistentVolumeClaim to another service attempting to do this, I get a FailedMount error that says the same NOAUTH issues, even though those logs are not present on the juicefs controller pods anymore. This is very confusing to me, below are those logs. I used to get them on the controller itself as you can see in my original post.

MountVolume.SetUp failed for volume "pvc-037c46b0-a3c4-435e-bf1c-d699ac7f7e96" : rpc error: code = Internal desc = Could not mount juicefs: 2024/09/11 14:18:19.069148 juicefs[54] <INFO>: Meta address: redis://default:****@conductor-redis-sentinel,conductor-redis-sentinel.conductor.svc.cluster.local:26379/2 [interface.go:497] redis: 2024/09/11 14:18:19 sentinel.go:537: sentinel: GetMasterAddrByName master="conductor-redis-sentinel" failed: NOAUTH HELLO must be called with the client already authenticated, otherwise the HELLO <proto> AUTH <user> <pass> option can be used to authenticate the client and select the RESP protocol version at the same time 2024/09/11 14:18:19.077084 juicefs[54] <WARNING>: parse info: redis: all sentinels specified in configuration are unreachable [redis.go:3575] redis: 2024/09/11 14:18:19 sentinel.go:537: sentinel: GetMasterAddrByName master="conductor-redis-sentinel" failed: NOAUTH HELLO must be called with the client already authenticated, otherwise the HELLO <proto> AUTH <user> <pass> option can be used to authenticate the client and select the RESP protocol version at the same time 2024/09/11 14:18:19.086926 juicefs[54] <FATAL>: load setting: redis: all sentinels specified in configuration are unreachable [status.go:96] : exit status 1

Why would the NOAUTH logs be present on the pod trying to mount juicefs, but not in juicefs itself?

davies commented 3 days ago

This looks like a bug in CSI, transfer it.

zwwhdls commented 2 days ago

Hi, @deanpatel2 can you find the mount pod and debug into it?

kubectl -n <namespace> debug <mountpod> -it  --copy-to=myapp --container=jfs-mount --image=<mountpodimage> -- bash

// when into it, check the env SENTINEL_PASSWORD
env | grep SENTINEL_PASSWORD
// test if it can be connected
juicefs format xxxx

Also, make sure sentinel is enabled in redis, see https://github.com/bitnami/charts/blob/main/bitnami/redis/values.yaml#L1129