Open louhisuo opened 2 weeks ago
Please use the NAS's account and password, not CHAP A/P.
Thanks.
Thank you. Based on above I managed to make progress, however I am now hitting now another issue which looks very similar to pvc is created but pod is unable to mount the volume (#13). I am also running Talos Linux single node cluster.
I made some progress and can configure backend with previously defined TridentBackEnd configuration. Now facing issue where pod is not able to consume PVC and is getting stuck in status ContainerCreating.
I am having following StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: quts-hero-ssd-raid1
provisioner: csi.trident.qnap.io
parameters:
selector: "performance=basic"
allowVolumeExpansion: true
following PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: quts-hero-test-pvc
spec:
storageClassName: quts-hero-ssd-raid1
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
and following Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment-one
spec:
replicas: 1
selector:
matchLabels:
app: multi-deployment
template:
metadata:
labels:
app: multi-deployment
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
volumeMounts:
- name: storage
mountPath: /tmp/k8s
volumes:
- name: storage
persistentVolumeClaim:
claimName: quts-hero-test-pvc
I see following event in pod
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 41s (x115 over 26h) kubelet MountVolume.MountDevice failed for volume "pvc-b271b1cd-03f6-4c32-a0cb-33a5edf2a7c7" : rpc error: code = Internal desc = rpc error: code = Internal desc = failed to stage volume: exit status 2
and following logged in trident-node-linux pod.
time="2024-08-27T12:35:57Z" level=debug msg="<<<< devices.getDeviceInfoForLUN" iSCSINodeName="iqn.2004-04.com.qnap:ts-673a:iscsi.iscsi-talos--pvc-b271b1cd-03f6-4c32-a0cb-33a5edf2a7c7.82aad6" logLayer=csi_frontend lunID=1 needFSType=false requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=debug msg="Found device." devices="[sda]" iqn="iqn.2004-04.com.qnap:ts-673a:iscsi.iscsi-talos--pvc-b271b1cd-03f6-4c32-a0cb-33a5edf2a7c7.82aad6" logLayer=csi_frontend multipathDevice= requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI scsiLun=1 workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=debug msg=">>>> devices.waitForDevice" device=/dev/sda logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=debug msg="Device found." device=/dev/sda logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=debug msg="<<<< devices.waitForDevice" device=/dev/sda logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=debug msg=">>>> devices.getDeviceFSType" device=/dev/sda logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=debug msg=">>>> devices.waitForDevice" device=/dev/sda logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=debug msg="Device found." device=/dev/sda logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=debug msg="<<<< devices.waitForDevice" device=/dev/sda logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=debug msg=">>>> command.ExecuteWithTimeout." args="[/dev/sda]" command=blkid logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI timeout=5s workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=debug msg="<<<< command.ExecuteWithTimeout." logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=info msg="Could not get FSType for device; err: exit status 2." device=/dev/sda logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=debug msg="<<<< devices.getDeviceFSType" logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=debug msg=">>>> devices.isDeviceUnformatted" device=/dev/sda logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=debug msg=">>>> command.ExecuteWithTimeout." args="[if=/dev/sda bs=4096 count=512 status=none]" command=dd logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI timeout=5s workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=debug msg="<<<< command.ExecuteWithTimeout." logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=error msg="failed to read the device" device=/dev/sda error="exit status 2" logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI workflow="node_server=stage"
Do I have configuration problem or is this fault?
hi @louhisuo,
Talos is a minimal Linux OS, and it lacks some basic utilities (like dd and others) that are typically found in most Linux systems.
Our service assumes that these tools are available on the node, so if they are missing, attempting to use them could lead to errors.
We are aware that Talos might support Linux utility extensions, which could potentially help you install the required utilities.
Thank you.
Ref :
https://github.com/siderolabs/extensions?tab=readme-ov-file
https://github.com/siderolabs/extensions/tree/main/tools/util-linux
I have added util-linux-tools talos extension to the cluster (see below)
% talosctl get extensions
NODE NAMESPACE TYPE ID VERSION NAME VERSION
172.16.1.244 runtime ExtensionStatus 0 1 iscsi-tools v0.1.4
172.16.1.244 runtime ExtensionStatus 1 1 qemu-guest-agent 8.2.2
172.16.1.244 runtime ExtensionStatus 2 1 util-linux-tools 2.39.3
172.16.1.244 runtime ExtensionStatus 3 1 schematic 88d1f7a5c4f1d3aba7df787c448c1d3d008ed29cfb34af53fa0df4336a56040b
The issue still remains (logs from trident-node-linux pod)
time="2024-08-28T12:09:38Z" level=debug msg="Found device." devices="[sda]" iqn="iqn.2004-04.com.qnap:ts-673a:iscsi.iscsi-talos--pvc-b4bad894-6ae3-438c-815c-6d7649c6ed54.82aad6" logLayer=csi_frontend multipathDevice= requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI scsiLun=1 workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg=">>>> devices.waitForDevice" device=/dev/sda logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg="Device found." device=/dev/sda logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg="<<<< devices.waitForDevice" device=/dev/sda logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg=">>>> devices.getDeviceFSType" device=/dev/sda logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg=">>>> devices.waitForDevice" device=/dev/sda logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg="Device found." device=/dev/sda logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg="<<<< devices.waitForDevice" device=/dev/sda logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg=">>>> command.ExecuteWithTimeout." args="[/dev/sda]" command=blkid logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI timeout=5s workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg="<<<< command.ExecuteWithTimeout." logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=info msg="Could not get FSType for device; err: exit status 2." device=/dev/sda logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg="<<<< devices.getDeviceFSType" logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg=">>>> devices.isDeviceUnformatted" device=/dev/sda logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg=">>>> command.ExecuteWithTimeout." args="[if=/dev/sda bs=4096 count=512 status=none]" command=dd logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI timeout=5s workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg="<<<< command.ExecuteWithTimeout." logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=error msg="failed to read the device" device=/dev/sda error="exit status 2" logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg="<<<< devices.isDeviceUnformatted" logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=error msg="Unable to identify if the device is unformatted; err: exit status 2" device=/dev/sda logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg="<<<< iscsi.AttachISCSIVolume" logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg="Attach iSCSI volume is not complete, waiting." error="exit status 2" increment=5.533169717s logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
Are you expecting some specific linux tool to be available on node? If it is dd
then my understanding is that dd
is not part of util-linux-tool
but coreutils
instead and Talos does not have extension which delivers coreutils
package.
Is this issue Support for Talos (#806) perhaps a reason why QNAP CSI Plugin does not work with Talos Linux?
@louhisuo looks like you arrived at the same point I did. The next thing I was going to do was build a talos extension for coreutils
similar to the util-linux one. It doesn't look that difficult to get going and it should be possible to deploy it as a github package I just haven't had time to do it yet.
Is this issue Support for Talos (#806) perhaps a reason why QNAP CSI Plugin does not work with Talos Linux?
Yes, this issue has the same root cause as ours. The unavailability of certain utilities like dd on the node causes the plugin to be unusable.
You can reference the document we provided earlier for Linux utility extension or seek help from Talos.
Yes, it looks like we both are hitting same issue @brunnels. Looking forward to have coreutils
extension for Talos OS which would bring availability of dd
command into Talos OS. My concern here is what other tools, we do not know, are missing in Talos OS as their design principle has been to remove everything from OS which is not required to run Kubernetes.
@davidcheng0716, if QNAP is serious to position their NAS products as Kubernetes storage QNAP needs to consider to make to investments in this area.
(1) Refactor QNAP CSI Driver as OS agnostic by including all needed tools into CSI driver. With this approach it will be easier for QNAP support wide range of operating systems and kubernetes distributions with minimal effort. (2) Create user documentation which describes how driver should be configured to work with QNAP NAS boxes. This will reduce support efforts from QNAP engineers and increases adoption for QNAP as Kubernetes storage. (3) Add support for other storage technologies available in QNAP NAS boxes (Samba, NFS, S3)
QNAP is way behind Synology regarding this (see below). And to be very direct it is very hard to recommend QNAP as Kubernetes storage when comparing what Synology can offer. Synology CSI Driver for Kubernetes iSCSI Storage with Synology CSI
Having QNAP TS-673A running QuTS hero h5.2.0.2860. QNAP CSI Plugin version is 1.3.0. Kubernetes version is 1.30.3 (Talos Linux).
Trying to initialize qnap csi plugin against following backend
using this trident backend configuration.
When describing
TridentBackendConfig
seeing following errors and also errors in in trident-controller podstorage-api-server
andtrident-main
container logs... and ip address of kubernetes cluster gets added to IP block list.
Note also that Talos Linux has has implemented some kubernetes security hardening by default and I get following type of warnings when deploying plugin as well as when restarting deployments and daemonset
Please advice if this is fault or configuration mistake.