openebs / mayastor

Dynamically provision Stateful Persistent Replicated Cluster-wide Fabric Volumes & Filesystems for Kubernetes that is provisioned from an optimized NVME SPDK backend data storage stack.
Apache License 2.0
742 stars 106 forks source link

long reboot time due to nvme reconnect #1764

Open todeb opened 5 days ago

todeb commented 5 days ago

Describe the bug after initiating reboot of k8s node with mayastor, OS in 8 mins was trying to reconnect nvme.

To Reproduce init reboot of OS

Expected behavior I dont know how these nvmes are handled but imo nvme should not lock system reboot, especially if they failing in connection

Screenshots

Oct 24 14:56:38 node1 systemd[1]: Unmounted var-lib-kubelet-pods-c4376117\x2d0426\x2d4eda\x2d8590\x2d5d9fbc8c8a5d-volumes-kubernetes.io\x7ecsi-pvc\x2d81c7ff78\x2d70b7\x2d4cab\x2db72e\x2d76bfd6888e42-mount.mount - /var/lib/>
Oct 24 14:56:38 node1 systemd[1]: var-lib-kubelet-pods-73932d89\x2d7b3a\x2d4ecd\x2da250\x2d94292beb8bbf-volumes-kubernetes.io\x7ecsi-pvc\x2d793eefb1\x2d829a\x2d4e3f\x2da2db\x2d5108f865e410-mount.mount: Mount process still >
Oct 24 14:56:38 node1 systemd[1]: var-lib-kubelet-pods-73932d89\x2d7b3a\x2d4ecd\x2da250\x2d94292beb8bbf-volumes-kubernetes.io\x7ecsi-pvc\x2d793eefb1\x2d829a\x2d4e3f\x2da2db\x2d5108f865e410-mount.mount: Failed with result '>
Oct 24 14:56:38 node1 systemd[1]: var-lib-kubelet-pods-73932d89\x2d7b3a\x2d4ecd\x2da250\x2d94292beb8bbf-volumes-kubernetes.io\x7ecsi-pvc\x2d793eefb1\x2d829a\x2d4e3f\x2da2db\x2d5108f865e410-mount.mount: Unit process 2456087>
Oct 24 14:56:38 node1 systemd[1]: Unmounted var-lib-kubelet-pods-73932d89\x2d7b3a\x2d4ecd\x2da250\x2d94292beb8bbf-volumes-kubernetes.io\x7ecsi-pvc\x2d793eefb1\x2d829a\x2d4e3f\x2da2db\x2d5108f865e410-mount.mount - /var/lib/>
Oct 24 14:56:38 node1 systemd[1]: Stopped target local-fs-pre.target - Preparation for Local File Systems.
Oct 24 14:56:38 node1 systemd[1]: Reached target umount.target - Unmount All Filesystems.
Oct 24 14:56:38 node1 systemd[1]: Stopping lvm2-monitor.service - Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling...
Oct 24 14:56:38 node1 systemd[1]: Stopping multipathd.service - Device-Mapper Multipath Device Controller...
Oct 24 14:56:38 node1 multipathd[3636859]: multipathd: shut down
Oct 24 14:56:38 node1 systemd[1]: systemd-tmpfiles-setup-dev.service: Deactivated successfully.
Oct 24 14:56:38 node1 systemd[1]: Stopped systemd-tmpfiles-setup-dev.service - Create Static Device Nodes in /dev.
Oct 24 14:56:38 node1 systemd[1]: systemd-tmpfiles-setup-dev-early.service: Deactivated successfully.
Oct 24 14:56:38 node1 systemd[1]: Stopped systemd-tmpfiles-setup-dev-early.service - Create Static Device Nodes in /dev gracefully.
Oct 24 14:56:38 node1 kernel: block nvme0n1: no usable path - requeuing I/O
Oct 24 14:56:38 node1 kernel: block nvme0n1: no usable path - requeuing I/O
Oct 24 14:56:38 node1 kernel: block nvme1n1: no usable path - requeuing I/O
Oct 24 14:56:38 node1 kernel: block nvme1n1: no usable path - requeuing I/O
Oct 24 14:56:38 node1 kernel: block nvme2n1: no usable path - requeuing I/O
Oct 24 14:56:38 node1 kernel: block nvme2n1: no usable path - requeuing I/O
Oct 24 14:56:38 node1 kernel: block nvme3n1: no usable path - requeuing I/O
Oct 24 14:56:38 node1 kernel: block nvme3n1: no usable path - requeuing I/O
Oct 24 14:56:38 node1 kernel: block nvme5n1: no usable path - requeuing I/O
Oct 24 14:56:38 node1 kernel: block nvme5n1: no usable path - requeuing I/O
Oct 24 14:56:38 node1 systemd[1]: multipathd.service: Deactivated successfully.
Oct 24 14:56:38 node1 systemd[1]: Stopped multipathd.service - Device-Mapper Multipath Device Controller.
Oct 24 14:56:38 node1 systemd[1]: multipathd.service: Consumed 1min 20.798s CPU time, 19.0M memory peak, 0B memory swap peak.
Oct 24 14:56:38 node1 systemd[1]: systemd-remount-fs.service: Deactivated successfully.
Oct 24 14:56:38 node1 systemd[1]: Stopped systemd-remount-fs.service - Remount Root and Kernel File Systems.
Oct 24 14:56:40 node1 kernel: tasks_rcu_exit_srcu_stall: rcu_tasks grace period number 1785 (since boot) gp_state: RTGS_POST_SCAN_TASKLIST is 460794 jiffies old.
Oct 24 14:56:40 node1 kernel: Please check any exiting tasks stuck between calls to exit_tasks_rcu_start() and exit_tasks_rcu_finish()
Oct 24 14:56:40 node1 kernel: nvme nvme7: failed to connect socket: -111
Oct 24 14:56:40 node1 kernel: nvme nvme0: failed to connect socket: -111
Oct 24 14:56:40 node1 kernel: nvme nvme8: failed to connect socket: -111
Oct 24 14:56:40 node1 kernel: nvme nvme8: Failed reconnect attempt 46
Oct 24 14:56:40 node1 kernel: nvme nvme8: Reconnecting in 10 seconds...
Oct 24 14:56:40 node1 kernel: nvme nvme1: failed to connect socket: -111
Oct 24 14:56:40 node1 kernel: nvme nvme1: Failed reconnect attempt 46
Oct 24 14:56:40 node1 kernel: nvme nvme1: Reconnecting in 10 seconds...
Oct 24 14:56:40 node1 kernel: nvme nvme5: failed to connect socket: -111
Oct 24 14:56:40 node1 kernel: nvme nvme5: Failed reconnect attempt 46
Oct 24 14:56:40 node1 kernel: nvme nvme5: Reconnecting in 10 seconds...
Oct 24 14:56:40 node1 kernel: nvme nvme3: failed to connect socket: -111
Oct 24 14:56:40 node1 kernel: nvme nvme3: Failed reconnect attempt 46
Oct 24 14:56:40 node1 kernel: nvme nvme3: Reconnecting in 10 seconds...
Oct 24 14:56:40 node1 kernel: nvme nvme6: failed to connect socket: -111
Oct 24 14:56:40 node1 kernel: nvme nvme6: Failed reconnect attempt 46
Oct 24 14:56:40 node1 kernel: nvme nvme6: Reconnecting in 10 seconds...
Oct 24 14:56:40 node1 kernel: nvme nvme2: failed to connect socket: -111
Oct 24 14:56:40 node1 kernel: nvme nvme2: Failed reconnect attempt 46
Oct 24 14:56:40 node1 kernel: nvme nvme2: Reconnecting in 10 seconds...
Oct 24 14:56:40 node1 kernel: nvme nvme7: Failed reconnect attempt 46
Oct 24 14:56:40 node1 kernel: nvme nvme0: Failed reconnect attempt 46
Oct 24 14:56:40 node1 kernel: nvme nvme7: Reconnecting in 10 seconds...
Oct 24 14:56:40 node1 kernel: nvme nvme0: Reconnecting in 10 seconds...
Oct 24 14:56:50 node1 kernel: tasks_rcu_exit_srcu_stall: rcu_tasks grace period number 1785 (since boot) gp_state: RTGS_POST_SCAN_TASKLIST is 471034 jiffies old.
Oct 24 14:56:50 node1 kernel: Please check any exiting tasks stuck between calls to exit_tasks_rcu_start() and exit_tasks_rcu_finish()
Oct 24 14:56:50 node1 kernel: nvme nvme6: failed to connect socket: -111
Oct 24 14:56:50 node1 kernel: nvme nvme5: failed to connect socket: -111
Oct 24 14:56:50 node1 kernel: nvme nvme1: failed to connect socket: -111
Oct 24 14:56:50 node1 kernel: nvme nvme1: Failed reconnect attempt 47
Oct 24 14:56:50 node1 kernel: nvme nvme1: Reconnecting in 10 seconds...
Oct 24 14:56:50 node1 kernel: nvme nvme0: failed to connect socket: -111
Oct 24 14:56:50 node1 kernel: nvme nvme0: Failed reconnect attempt 47
Oct 24 14:56:50 node1 kernel: nvme nvme0: Reconnecting in 10 seconds...
Oct 24 14:56:50 node1 kernel: nvme nvme2: failed to connect socket: -111
Oct 24 14:56:50 node1 kernel: nvme nvme2: Failed reconnect attempt 47
Oct 24 14:56:50 node1 kernel: nvme nvme2: Reconnecting in 10 seconds...
Oct 24 14:56:50 node1 kernel: nvme nvme7: failed to connect socket: -111
.....
Oct 24 15:04:41 node1 kernel: nvme nvme0: failed to connect socket: -111
Oct 24 15:04:41 node1 kernel: nvme nvme0: Failed reconnect attempt 93
Oct 24 15:04:41 node1 kernel: nvme nvme0: Reconnecting in 10 seconds...
Oct 24 15:04:41 node1 kernel: nvme nvme6: Failed reconnect attempt 93
Oct 24 15:04:41 node1 kernel: nvme nvme5: Reconnecting in 10 seconds...
Oct 24 15:04:41 node1 kernel: nvme nvme6: Reconnecting in 10 seconds...
Oct 24 15:04:49 node1 systemd-shutdown[1]: Syncing filesystems and block devices - timed out, issuing SIGKILL to PID 2456488.
Oct 24 15:04:49 node1 systemd-shutdown[1]: Sending SIGTERM to remaining processes...
Oct 24 15:04:49 node1 systemd-journald[734]: Journal stopped

OS info (please complete the following information):

Additional context Add any other context about the problem here.

tiagolobocastro commented 5 days ago

Did you reboot without draining the node? The nvme kernel initiator will keep trying to connect for some time. If you really want to do this, then you need to do something like this on the host:

for dev in /sys/class/nvme/*/ctrl_loss_tmo; do echo 10 | sudo tee -a $dev; done

But I suggest you drain the node first before rebooting otherwise the filesystems may not unmount gracefully and potentially result in data loss.

todeb commented 5 days ago

do you mean kubectl-mayastor drain command or usual kubectl drain to evict pods? Im not draining node because im using single replica and have apps replicated by their own, so don't want that they will be scheduled on other nodes.

tiagolobocastro commented 5 days ago

do you mean kubectl-mayastor drain command or usual kubectl drain to evict pods?

I mean the usual kubectl drain to evict the pods using the mayastor volumes to other nodes.

Im not draining node because im using single replica and have apps replicated by their own, so don't want that they will be scheduled on other nodes.

Ah I see... are you using the GracefulNodeShutdown feature? IIRC that should ensure the apps are stopped gracefully, but IIRC the volumeattachment and connections would still remain? CC @Abhinandan-Purkait

Abhinandan-Purkait commented 4 days ago

Ah I see... are you using the GracefulNodeShutdown feature? IIRC that should ensure the apps are stopped gracefully, but IIRC the volumeattachment and connections would still remain? CC @Abhinandan-Purkait

Yes, that's true. IIRC we need to remove the attachments manually.

todeb commented 4 days ago

not using it.