juicedata / juicefs-csi-driver

JuiceFS CSI Driver
https://github.com/juicedata/juicefs
Apache License 2.0
220 stars 84 forks source link

[BUG] csi pod socket 出现断开,无法平滑升级 #1139

Open YunhuiChen opened 2 weeks ago

YunhuiChen commented 2 weeks ago

What happened: 执行平滑升级出现:

E1010 08:47:07.515335    2560 grace.go:412] "grace: error connecting to socket" err="dial unix /tmp/juicefs-csi-shutdown.sock: connect: connection refused"
E1010 08:47:07.515385    2560 upgrade.go:42] "main: failed to upgrade mount pod" err="dial unix /tmp/juicefs-csi-shutdown.sock: connect: connection refused"

看到mount pod有日志:

2024/10/10 08:29:21.344859 juicefs[1] <WARNING>: send fd to /tmp/fuse_fd_csi_comm.sock: dial unix /tmp/fuse_fd_csi_comm.sock: connect: no such file or directory [passfd.go:123]

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?

Environment:

zwwhdls commented 2 weeks ago

in csi:

root@juicefs-csi-node-9897d:/app# lsof -p 8
COMMAND   PID USER   FD      TYPE             DEVICE SIZE/OFF      NODE NAME
juicefs-c   8 root  cwd       DIR              0,252     4096   1473032 /app
juicefs-c   8 root  rtd       DIR              0,252     4096   5505897 /
juicefs-c   8 root  txt       REG              0,252 40513688   5505805 /usr/local/bin/juicefs-csi-driver
juicefs-c   8 root    0r      CHR                1,3      0t0         6 /dev/null
juicefs-c   8 root    1w     FIFO               0,13      0t0 196809539 pipe
juicefs-c   8 root    2w     FIFO               0,13      0t0 196809540 pipe
juicefs-c   8 root    3u  a_inode               0,14        0     11440 [eventpoll]
juicefs-c   8 root    4r  a_inode               0,14        0     11440 inotify
juicefs-c   8 root    5r     FIFO               0,13      0t0 196808381 pipe
juicefs-c   8 root    6w     FIFO               0,13      0t0 196808381 pipe
juicefs-c   8 root    7u  a_inode               0,14        0     11440 [eventpoll]
juicefs-c   8 root    8r     FIFO               0,13      0t0 196810129 pipe
juicefs-c   8 root    9w     FIFO               0,13      0t0 196810129 pipe
juicefs-c   8 root   12u     unix 0xffff9c5e3b9d8c00      0t0 196812115 /tmp/juicefs-csi-shutdown.sock type=STREAM
juicefs-c   8 root   13u     IPv4          196812116      0t0       TCP juicefs-csi-node-9897d:48228->172-28-39-187.kubernetes.default.svc.cluster.local:10250 (ESTABLISHED)
juicefs-c   8 root   14u     IPv6          196808396      0t0       TCP *:http-alt (LISTEN)
juicefs-c   8 root   15u     IPv4          196809565      0t0       TCP localhost:6060 (LISTEN)
juicefs-c   8 root   16u     unix 0xffff9c5e3c367400      0t0 196811093 /csi/csi.sock type=STREAM
juicefs-c   8 root   17u     unix 0xffff9c60ea935800      0t0 196810221 /csi/csi.sock type=STREAM
juicefs-c   8 root   19u     unix 0xffff9c5e3c361000      0t0 197101508 /tmp/00ce3e423bb79f4822ead350ac9be2a58ff9f2000af905298dab7be62b49500/fuse_fd_csi_comm.sock type=STREAM
juicefs-c   8 root   21u      CHR             10,229      0t0        22 /fuse
root@juicefs-csi-node-9897d:/app#
root@juicefs-csi-node-9897d:/app# netstat -nlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:6060          0.0.0.0:*               LISTEN      8/juicefs-csi-drive
tcp6       0      0 :::8080                 :::*                    LISTEN      8/juicefs-csi-drive
tcp6       0      0 :::9809                 :::*                    LISTEN      -
tcp6       0      0 :::9909                 :::*                    LISTEN      -
Active UNIX domain sockets (only servers)
Proto RefCnt Flags       Type       State         I-Node   PID/Program name     Path
unix  2      [ ACC ]     STREAM     LISTENING     197101508 8/juicefs-csi-drive  /tmp/00ce3e423bb79f4822ead350ac9be2a58ff9f2000af905298dab7be62b49500/fuse_fd_csi_comm.sock
unix  2      [ ACC ]     STREAM     LISTENING     196812115 8/juicefs-csi-drive  /tmp/juicefs-csi-shutdown.sock
unix  2      [ ACC ]     STREAM     LISTENING     196813895 -                    /registration/csi.juicefs.com-reg.sock
unix  2      [ ACC ]     STREAM     LISTENING     196811093 8/juicefs-csi-drive  /csi/csi.sock
root@juicefs-csi-node-9897d:/app#

It seems something wrong in socket connection, but csi still listens all sock files.

zwwhdls commented 2 weeks ago

Restart csi node can recover this issue.