Open louhisuo opened 3 months ago
Please use the NAS's account and password, not CHAP A/P.
Thanks.
Thank you. Based on above I managed to make progress, however I am now hitting now another issue which looks very similar to pvc is created but pod is unable to mount the volume (#13). I am also running Talos Linux single node cluster.
I made some progress and can configure backend with previously defined TridentBackEnd configuration. Now facing issue where pod is not able to consume PVC and is getting stuck in status ContainerCreating.
I am having following StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: quts-hero-ssd-raid1
provisioner: csi.trident.qnap.io
parameters:
selector: "performance=basic"
allowVolumeExpansion: true
following PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: quts-hero-test-pvc
spec:
storageClassName: quts-hero-ssd-raid1
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
and following Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment-one
spec:
replicas: 1
selector:
matchLabels:
app: multi-deployment
template:
metadata:
labels:
app: multi-deployment
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
volumeMounts:
- name: storage
mountPath: /tmp/k8s
volumes:
- name: storage
persistentVolumeClaim:
claimName: quts-hero-test-pvc
I see following event in pod
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 41s (x115 over 26h) kubelet MountVolume.MountDevice failed for volume "pvc-b271b1cd-03f6-4c32-a0cb-33a5edf2a7c7" : rpc error: code = Internal desc = rpc error: code = Internal desc = failed to stage volume: exit status 2
and following logged in trident-node-linux pod.
time="2024-08-27T12:35:57Z" level=debug msg="<<<< devices.getDeviceInfoForLUN" iSCSINodeName="iqn.2004-04.com.qnap:ts-673a:iscsi.iscsi-talos--pvc-b271b1cd-03f6-4c32-a0cb-33a5edf2a7c7.82aad6" logLayer=csi_frontend lunID=1 needFSType=false requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=debug msg="Found device." devices="[sda]" iqn="iqn.2004-04.com.qnap:ts-673a:iscsi.iscsi-talos--pvc-b271b1cd-03f6-4c32-a0cb-33a5edf2a7c7.82aad6" logLayer=csi_frontend multipathDevice= requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI scsiLun=1 workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=debug msg=">>>> devices.waitForDevice" device=/dev/sda logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=debug msg="Device found." device=/dev/sda logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=debug msg="<<<< devices.waitForDevice" device=/dev/sda logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=debug msg=">>>> devices.getDeviceFSType" device=/dev/sda logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=debug msg=">>>> devices.waitForDevice" device=/dev/sda logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=debug msg="Device found." device=/dev/sda logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=debug msg="<<<< devices.waitForDevice" device=/dev/sda logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=debug msg=">>>> command.ExecuteWithTimeout." args="[/dev/sda]" command=blkid logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI timeout=5s workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=debug msg="<<<< command.ExecuteWithTimeout." logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=info msg="Could not get FSType for device; err: exit status 2." device=/dev/sda logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=debug msg="<<<< devices.getDeviceFSType" logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=debug msg=">>>> devices.isDeviceUnformatted" device=/dev/sda logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=debug msg=">>>> command.ExecuteWithTimeout." args="[if=/dev/sda bs=4096 count=512 status=none]" command=dd logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI timeout=5s workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=debug msg="<<<< command.ExecuteWithTimeout." logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI workflow="node_server=stage"
time="2024-08-27T12:35:57Z" level=error msg="failed to read the device" device=/dev/sda error="exit status 2" logLayer=csi_frontend requestID=078c660a-a0ab-4333-8520-9a9720e229ff requestSource=CSI workflow="node_server=stage"
Do I have configuration problem or is this fault?
hi @louhisuo,
Talos is a minimal Linux OS, and it lacks some basic utilities (like dd and others) that are typically found in most Linux systems.
Our service assumes that these tools are available on the node, so if they are missing, attempting to use them could lead to errors.
We are aware that Talos might support Linux utility extensions, which could potentially help you install the required utilities.
Thank you.
Ref :
https://github.com/siderolabs/extensions?tab=readme-ov-file
https://github.com/siderolabs/extensions/tree/main/tools/util-linux
I have added util-linux-tools talos extension to the cluster (see below)
% talosctl get extensions
NODE NAMESPACE TYPE ID VERSION NAME VERSION
172.16.1.244 runtime ExtensionStatus 0 1 iscsi-tools v0.1.4
172.16.1.244 runtime ExtensionStatus 1 1 qemu-guest-agent 8.2.2
172.16.1.244 runtime ExtensionStatus 2 1 util-linux-tools 2.39.3
172.16.1.244 runtime ExtensionStatus 3 1 schematic 88d1f7a5c4f1d3aba7df787c448c1d3d008ed29cfb34af53fa0df4336a56040b
The issue still remains (logs from trident-node-linux pod)
time="2024-08-28T12:09:38Z" level=debug msg="Found device." devices="[sda]" iqn="iqn.2004-04.com.qnap:ts-673a:iscsi.iscsi-talos--pvc-b4bad894-6ae3-438c-815c-6d7649c6ed54.82aad6" logLayer=csi_frontend multipathDevice= requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI scsiLun=1 workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg=">>>> devices.waitForDevice" device=/dev/sda logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg="Device found." device=/dev/sda logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg="<<<< devices.waitForDevice" device=/dev/sda logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg=">>>> devices.getDeviceFSType" device=/dev/sda logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg=">>>> devices.waitForDevice" device=/dev/sda logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg="Device found." device=/dev/sda logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg="<<<< devices.waitForDevice" device=/dev/sda logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg=">>>> command.ExecuteWithTimeout." args="[/dev/sda]" command=blkid logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI timeout=5s workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg="<<<< command.ExecuteWithTimeout." logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=info msg="Could not get FSType for device; err: exit status 2." device=/dev/sda logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg="<<<< devices.getDeviceFSType" logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg=">>>> devices.isDeviceUnformatted" device=/dev/sda logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg=">>>> command.ExecuteWithTimeout." args="[if=/dev/sda bs=4096 count=512 status=none]" command=dd logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI timeout=5s workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg="<<<< command.ExecuteWithTimeout." logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=error msg="failed to read the device" device=/dev/sda error="exit status 2" logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg="<<<< devices.isDeviceUnformatted" logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=error msg="Unable to identify if the device is unformatted; err: exit status 2" device=/dev/sda logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg="<<<< iscsi.AttachISCSIVolume" logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
time="2024-08-28T12:09:38Z" level=debug msg="Attach iSCSI volume is not complete, waiting." error="exit status 2" increment=5.533169717s logLayer=csi_frontend requestID=e68df392-5823-4884-83d6-ec5539266468 requestSource=CSI workflow="node_server=stage"
Are you expecting some specific linux tool to be available on node? If it is dd
then my understanding is that dd
is not part of util-linux-tool
but coreutils
instead and Talos does not have extension which delivers coreutils
package.
Is this issue Support for Talos (#806) perhaps a reason why QNAP CSI Plugin does not work with Talos Linux?
@louhisuo looks like you arrived at the same point I did. The next thing I was going to do was build a talos extension for coreutils
similar to the util-linux one. It doesn't look that difficult to get going and it should be possible to deploy it as a github package I just haven't had time to do it yet.
Is this issue Support for Talos (#806) perhaps a reason why QNAP CSI Plugin does not work with Talos Linux?
Yes, this issue has the same root cause as ours. The unavailability of certain utilities like dd on the node causes the plugin to be unusable.
You can reference the document we provided earlier for Linux utility extension or seek help from Talos.
Yes, it looks like we both are hitting same issue @brunnels. Looking forward to have coreutils
extension for Talos OS which would bring availability of dd
command into Talos OS. My concern here is what other tools, we do not know, are missing in Talos OS as their design principle has been to remove everything from OS which is not required to run Kubernetes.
@davidcheng0716, if QNAP is serious to position their NAS products as Kubernetes storage QNAP needs to consider to make to investments in this area.
(1) Refactor QNAP CSI Driver as OS agnostic by including all needed tools into CSI driver. With this approach it will be easier for QNAP support wide range of operating systems and kubernetes distributions with minimal effort. (2) Create user documentation which describes how driver should be configured to work with QNAP NAS boxes. This will reduce support efforts from QNAP engineers and increases adoption for QNAP as Kubernetes storage. (3) Add support for other storage technologies available in QNAP NAS boxes (Samba, NFS, S3)
QNAP is way behind Synology regarding this (see below). And to be very direct it is very hard to recommend QNAP as Kubernetes storage when comparing what Synology can offer. Synology CSI Driver for Kubernetes iSCSI Storage with Synology CSI
@louhisuo , your concern seems very well founded in this case. I hacked together a coreutils extension , so now the QNAP CSI Plugin can tell that the disk isn't formatted, so now it wants mkfs.ext4. I might play whack a mole with this a bit and see if I can turn that into a proper qnap support extension.
A few rounds of whack-a-mole later...
It seems to be getting past the dd issue, and several others, and I don't see any more obvious missing executables in the logs, but it's still not working.
It says "Mount information not found", but I can't tell what command it's trying to use to do that. This part looks like maybe it's missing something, but it's unclear what:
time="2024-09-19T04:06:06Z" level=debug msg=">>>> command.Execute." args="[/dev/sda /var/lib/kubelet/pods/dce3d35d-0b1d-4eba-80fa-576f42c5f447/volumes/kubernetes.io~csi/pvc-e9f3ea16-6537-4887-9175-35d8fc81791d/mount]" command=mount logLayer=csi_frontend requestID=f8b12c63-c85c-4243-b039-ad7f4eb5486c requestSource=CSI workflow="node_server=publish"
time="2024-09-19T04:06:06Z" level=debug msg="<<<< Execute." command=mount error="exit status 2" logLayer=csi_frontend output="panic: no such file or directory\n\ngoroutine 1 [running]:\nmain.main()\n\t/go/src/github.com/qnap/trident/chwrap/chwrap.go:104 +0x32e" requestID=f8b12c63-c85c-4243-b039-ad7f4eb5486c requestSource=CSI workflow="node_server=publish"
More log context here:
time="2024-09-19T04:06:05Z" level=debug msg="Found iSCSI host/session." hostNumber=2 logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI sessionNumber=1 workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="<<<< iscsi.GetISCSIHostSessionMapForTarget" iSCSINodeName=iscsi-trident-pvc-e9f3ea16-6537-4887-9175-35d8fc81791d logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="Built iSCSI host/session map." hostSessionMap="map[2:1]" logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg=">>>> iscsi.iSCSIScanTargetLUN" hosts="[2]" logLayer=csi_frontend lunID=0 requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="Invoked SCSI scan for host." host=2 logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI scanCmd="0 0 0" scanFile=/sys/class/scsi_host/host2/scan workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="<<<< iscsi.iSCSIScanTargetLUN" hosts="[2]" logLayer=csi_frontend lunID=0 requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="Scanning paths: [/sys/class/scsi_host/host2/device/session1/iscsi_session/session1/device/target2:0:0/2:0:0:0]" logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="Paths found: /sys/class/scsi_host/host2/device/session1/iscsi_session/session1/device/target2:0:0/2:0:0:0/block" logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="All Paths found: [/sys/class/scsi_host/host2/device/session1/iscsi_session/session1/device/target2:0:0/2:0:0:0/block]" logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="<<<< iscsi.waitForDeviceScan" iSCSINodeName=iscsi-trident-pvc-e9f3ea16-6537-4887-9175-35d8fc81791d logLayer=csi_frontend lunID=0 requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg=">>>> devices.waitForMultipathDeviceForLUN" iSCSINodeName=iscsi-trident-pvc-e9f3ea16-6537-4887-9175-35d8fc81791d logLayer=csi_frontend lunID=0 requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg=">>>> iscsi.GetISCSIHostSessionMapForTarget" iSCSINodeName=iscsi-trident-pvc-e9f3ea16-6537-4887-9175-35d8fc81791d logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="Found iSCSI host/session." hostNumber=2 logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI sessionNumber=1 workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="<<<< iscsi.GetISCSIHostSessionMapForTarget" iSCSINodeName=iscsi-trident-pvc-e9f3ea16-6537-4887-9175-35d8fc81791d logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="<<<< devices.waitForMultipathDeviceForLUN" iSCSINodeName=iscsi-trident-pvc-e9f3ea16-6537-4887-9175-35d8fc81791d logLayer=csi_frontend lunID=0 requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg=">>>> devices.getDeviceInfoForLUN" iSCSINodeName=iscsi-trident-pvc-e9f3ea16-6537-4887-9175-35d8fc81791d logLayer=csi_frontend lunID=0 needFSType=false requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg=">>>> iscsi.GetISCSIHostSessionMapForTarget" iSCSINodeName=iscsi-trident-pvc-e9f3ea16-6537-4887-9175-35d8fc81791d logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="Found iSCSI host/session." hostNumber=2 logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI sessionNumber=1 workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="<<<< iscsi.GetISCSIHostSessionMapForTarget" iSCSINodeName=iscsi-trident-pvc-e9f3ea16-6537-4887-9175-35d8fc81791d logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg=">>>> devices.findMultipathDeviceForDevice" device=sda logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="Could not find multipath device for device." device=sda logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="<<<< devices.findMultipathDeviceForDevice" device=sda logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="Found SCSI device." deviceNames="[sda]" fsType= hostSessionMap="map[2:1]" logLayer=csi_frontend lun=0 multipathDevice= requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="<<<< devices.getDeviceInfoForLUN" iSCSINodeName=iscsi-trident-pvc-e9f3ea16-6537-4887-9175-35d8fc81791d logLayer=csi_frontend lunID=0 needFSType=false requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="Found device." devices="[sda]" iqn=iscsi-trident-pvc-e9f3ea16-6537-4887-9175-35d8fc81791d logLayer=csi_frontend multipathDevice= requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI scsiLun=0 workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg=">>>> devices.waitForDevice" device=/dev/sda logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="Device found." device=/dev/sda logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="<<<< devices.waitForDevice" device=/dev/sda logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg=">>>> devices.getDeviceFSType" device=/dev/sda logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg=">>>> devices.waitForDevice" device=/dev/sda logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="Device found." device=/dev/sda logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="<<<< devices.waitForDevice" device=/dev/sda logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg=">>>> command.ExecuteWithTimeout." args="[/dev/sda]" command=blkid logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI timeout=5s workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="<<<< command.ExecuteWithTimeout." logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=info msg="Could not get FSType for device; err: exit status 2." device=/dev/sda logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="<<<< devices.getDeviceFSType" logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg=">>>> devices.isDeviceUnformatted" device=/dev/sda logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg=">>>> command.ExecuteWithTimeout." args="[if=/dev/sda bs=4096 count=512 status=none]" command=dd logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI timeout=5s workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="<<<< command.ExecuteWithTimeout." logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="Verified correct number of bytes read." device=/dev/sda logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=info msg="Device is unformatted." device=/dev/sda logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="<<<< devices.isDeviceUnformatted" logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="Formatting LUN." fstype=ext4 logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI volume=trident-pvc-e9f3ea16-6537-4887-9175-35d8fc81791d workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg=">>>> filesystem.formatVolume" device=/dev/sda fsType=ext4 logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg=">>>> command.Execute." args="[-F /dev/sda]" command=mkfs.ext4 logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="<<<< Execute." command=mkfs.ext4 error="<nil>" logLayer=csi_frontend output="mke2fs 1.47.1 (20-May-2024)\nDiscarding device blocks: 0/2621440\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b \b\b\b\b\b\b\b\b\b\b\b\b\b\b\bdone \nCreating filesystem with 2621440 4k blocks and 655360 inodes\nFilesystem UUID: 4318f0d8-2caa-4760-b9fc-55600243c423\nSuperblock backups stored on blocks: \n\t32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632\n\nAllocating group tables: 0/80\b\b\b\b\b \b\b\b\b\bdone \nWriting inode tables: 0/80\b\b\b\b\b \b\b\b\b\bdone \nCreating journal (16384 blocks): done\nWriting superblocks and filesystem accounting information: 0/80\b\b\b\b\b \b\b\b\b\bdone\n" requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="<<<< filesystem.formatVolume" device=/dev/sda fsType=ext4 logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg=">>>> mount_linux.IsMounted" logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI source=/dev/sda target= workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="Mount information not found." logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI source=/dev/sda target= workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg="<<<< mount_linux.IsMounted" logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI source=/dev/sda target= workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg=">>>> filesystem.repairVolume" device=/dev/sda fsType=ext4 logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:05Z" level=debug msg=">>>> command.Execute." args="[-p /dev/sda]" command=fsck.ext4 logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:06Z" level=debug msg="<<<< Execute." command=fsck.ext4 error="<nil>" logLayer=csi_frontend output="/dev/sda: clean, 12/655360 files, 67265/2621440 blocks" requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:06Z" level=debug msg="<<<< filesystem.repairVolume" device=/dev/sda fsType=ext4 logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:06Z" level=debug msg="<<<< iscsi.AttachISCSIVolume" logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:06Z" level=debug msg="<<<< iscsi.AttachISCSIVolumeRetry" logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:06Z" level=debug msg="Writing temporary tracking info file." logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI tempFile=/var/lib/trident/tracking/tmp-pvc-e9f3ea16-6537-4887-9175-35d8fc81791d.json workflow="node_server=stage"
time="2024-09-19T04:06:06Z" level=debug msg="Updating tracking info file." fileName=/var/lib/trident/tracking/pvc-e9f3ea16-6537-4887-9175-35d8fc81791d.json logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:06Z" level=error msg="Failed to add portal to self-healing map; err: portal value cannot be empty" logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:06Z" level=debug msg="Released shared lock (NodeStageVolume-pvc-e9f3ea16-6537-4887-9175-35d8fc81791d)." lock=csi_node_server logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:06Z" level=debug msg="<<<< NodeStageVolume" Method=NodeStageVolume Type=CSI_Node logLayer=csi_frontend requestID=8a428070-fe50-4693-8bb8-9c9d1a9a7e68 requestSource=CSI workflow="node_server=stage"
time="2024-09-19T04:06:06Z" level=debug msg="GRPC call: /csi.v1.Node/NodeGetCapabilities" Request="GRPC request: " logLayer=csi_frontend requestID=6b65686f-af93-4109-94e7-3feee6b361d9 requestSource=CSI
time="2024-09-19T04:06:06Z" level=debug msg="GRPC call: /csi.v1.Node/NodeGetCapabilities" Request="GRPC request: " logLayer=csi_frontend requestID=d78e313f-9e87-497a-ab33-3b0d42b5da3c requestSource=CSI
time="2024-09-19T04:06:06Z" level=debug msg="GRPC call: /csi.v1.Node/NodeGetCapabilities" Request="GRPC request: " logLayer=csi_frontend requestID=ea5ebb3a-77a4-4db7-8999-95359cd96c93 requestSource=CSI
time="2024-09-19T04:06:06Z" level=debug msg="GRPC call: /csi.v1.Node/NodePublishVolume" Request="GRPC request: volume_id:\"pvc-e9f3ea16-6537-4887-9175-35d8fc81791d\" publish_context:<key:\"LUKSEncryption\" value:\"\" > publish_context:<key:\"SANType\" value:\"iscsi\" > publish_context:<key:\"filesystemType\" value:\"ext4\" > publish_context:<key:\"iscsiIgroup\" value:\"\" > publish_context:<key:\"iscsiInterface\" value:\"\" > publish_context:<key:\"iscsiLunNumber\" value:\"0\" > publish_context:<key:\"iscsiLunSerial\" value:\"\" > publish_context:<key:\"iscsiTargetIqn\" value:\"iscsi-trident-pvc-e9f3ea16-6537-4887-9175-35d8fc81791d\" > publish_context:<key:\"iscsiTargetPortalCount\" value:\"2\" > publish_context:<key:\"mountOptions\" value:\"\" > publish_context:<key:\"p1\" value:\"\" > publish_context:<key:\"p2\" value:\"192.168.48.17\" > publish_context:<key:\"protocol\" value:\"block\" > publish_context:<key:\"sharedTarget\" value:\"false\" > publish_context:<key:\"useCHAP\" value:\"false\" > staging_target_path:\"/var/lib/kubelet/plugins/kubernetes.io/csi/csi.trident.qnap.io/b70f8f463738d802b24f20ddc07c825f3f4d819a16fbc21134bcd2298552a654/globalmount\" target_path:\"/var/lib/kubelet/pods/dce3d35d-0b1d-4eba-80fa-576f42c5f447/volumes/kubernetes.io~csi/pvc-e9f3ea16-6537-4887-9175-35d8fc81791d/mount\" volume_capability:<mount:<> access_mode:<mode:SINGLE_NODE_MULTI_WRITER > > volume_context:<key:\"backendUUID\" value:\"bef1f4d1-194b-464c-bb4f-b198c940891d\" > volume_context:<key:\"internalID\" value:\"0334ceed-d841-4273-9429-94b58d3880eb\" > volume_context:<key:\"internalName\" value:\"trident-pvc-e9f3ea16-6537-4887-9175-35d8fc81791d\" > volume_context:<key:\"name\" value:\"pvc-e9f3ea16-6537-4887-9175-35d8fc81791d\" > volume_context:<key:\"protocol\" value:\"block\" > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"1726714173292-3016-csi.trident.qnap.io\" > " logLayer=csi_frontend requestID=f8b12c63-c85c-4243-b039-ad7f4eb5486c requestSource=CSI
time="2024-09-19T04:06:06Z" level=debug msg=">>>> NodePublishVolume" Method=NodePublishVolume Type=CSI_Node logLayer=csi_frontend requestID=f8b12c63-c85c-4243-b039-ad7f4eb5486c requestSource=CSI workflow="node_server=publish"
time="2024-09-19T04:06:06Z" level=debug msg="Attempting to acquire shared lock (NodePublishVolume-pvc-e9f3ea16-6537-4887-9175-35d8fc81791d); 1 position in the queue." lock=csi_node_server logLayer=csi_frontend requestID=f8b12c63-c85c-4243-b039-ad7f4eb5486c requestSource=CSI workflow="node_server=publish"
time="2024-09-19T04:06:06Z" level=debug msg="Acquired shared lock (NodePublishVolume-pvc-e9f3ea16-6537-4887-9175-35d8fc81791d)." lock=csi_node_server logLayer=csi_frontend requestID=f8b12c63-c85c-4243-b039-ad7f4eb5486c requestSource=CSI workflow="node_server=publish"
time="2024-09-19T04:06:06Z" level=debug msg="Volume tracking info found." logLayer=csi_frontend requestID=f8b12c63-c85c-4243-b039-ad7f4eb5486c requestSource=CSI volumeTrackingInfo="UseCHAP:false IscsiUsername:<REDACTED> IscsiInitiatorSecret:<REDACTED> IscsiTargetUsername:<REDACTED> IscsiTargetSecret:<REDACTED> " workflow="node_server=publish"
time="2024-09-19T04:06:06Z" level=debug msg=">>>> mount_linux.MountDevice" device=/dev/sda logLayer=csi_frontend mountpoint="/var/lib/kubelet/pods/dce3d35d-0b1d-4eba-80fa-576f42c5f447/volumes/kubernetes.io~csi/pvc-e9f3ea16-6537-4887-9175-35d8fc81791d/mount" options= requestID=f8b12c63-c85c-4243-b039-ad7f4eb5486c requestSource=CSI workflow="node_server=publish"
time="2024-09-19T04:06:06Z" level=debug msg=">>>> mount_linux.IsMounted" logLayer=csi_frontend requestID=f8b12c63-c85c-4243-b039-ad7f4eb5486c requestSource=CSI source=/dev/sda target="/var/lib/kubelet/pods/dce3d35d-0b1d-4eba-80fa-576f42c5f447/volumes/kubernetes.io~csi/pvc-e9f3ea16-6537-4887-9175-35d8fc81791d/mount" workflow="node_server=publish"
time="2024-09-19T04:06:06Z" level=debug msg="Mount information not found." logLayer=csi_frontend requestID=f8b12c63-c85c-4243-b039-ad7f4eb5486c requestSource=CSI source=/dev/sda target="/var/lib/kubelet/pods/dce3d35d-0b1d-4eba-80fa-576f42c5f447/volumes/kubernetes.io~csi/pvc-e9f3ea16-6537-4887-9175-35d8fc81791d/mount" workflow="node_server=publish"
time="2024-09-19T04:06:06Z" level=debug msg="<<<< mount_linux.IsMounted" logLayer=csi_frontend requestID=f8b12c63-c85c-4243-b039-ad7f4eb5486c requestSource=CSI source=/dev/sda target="/var/lib/kubelet/pods/dce3d35d-0b1d-4eba-80fa-576f42c5f447/volumes/kubernetes.io~csi/pvc-e9f3ea16-6537-4887-9175-35d8fc81791d/mount" workflow="node_server=publish"
time="2024-09-19T04:06:06Z" level=debug msg="Already mounted: false, mountpoint exists: false" logLayer=csi_frontend requestID=f8b12c63-c85c-4243-b039-ad7f4eb5486c requestSource=CSI workflow="node_server=publish"
time="2024-09-19T04:06:06Z" level=debug msg=">>>> command.Execute." args="[/dev/sda /var/lib/kubelet/pods/dce3d35d-0b1d-4eba-80fa-576f42c5f447/volumes/kubernetes.io~csi/pvc-e9f3ea16-6537-4887-9175-35d8fc81791d/mount]" command=mount logLayer=csi_frontend requestID=f8b12c63-c85c-4243-b039-ad7f4eb5486c requestSource=CSI workflow="node_server=publish"
time="2024-09-19T04:06:06Z" level=debug msg="<<<< Execute." command=mount error="exit status 2" logLayer=csi_frontend output="panic: no such file or directory\n\ngoroutine 1 [running]:\nmain.main()\n\t/go/src/github.com/qnap/trident/chwrap/chwrap.go:104 +0x32e" requestID=f8b12c63-c85c-4243-b039-ad7f4eb5486c requestSource=CSI workflow="node_server=publish"
time="2024-09-19T04:06:06Z" level=error msg="Mount failed." error="exit status 2" logLayer=csi_frontend requestID=f8b12c63-c85c-4243-b039-ad7f4eb5486c requestSource=CSI workflow="node_server=publish"
time="2024-09-19T04:06:06Z" level=debug msg="<<<< mount_linux.MountDevice" logLayer=csi_frontend requestID=f8b12c63-c85c-4243-b039-ad7f4eb5486c requestSource=CSI workflow="node_server=publish"
time="2024-09-19T04:06:06Z" level=debug msg="Released shared lock (NodePublishVolume-pvc-e9f3ea16-6537-4887-9175-35d8fc81791d)." lock=csi_node_server logLayer=csi_frontend requestID=f8b12c63-c85c-4243-b039-ad7f4eb5486c requestSource=CSI workflow="node_server=publish"
time="2024-09-19T04:06:06Z" level=debug msg="<<<< NodePublishVolume" Method=NodePublishVolume Type=CSI_Node logLayer=csi_frontend requestID=f8b12c63-c85c-4243-b039-ad7f4eb5486c requestSource=CSI workflow="node_server=publish"
time="2024-09-19T04:06:06Z" level=error msg="GRPC error: rpc error: code = Internal desc = unable to mount device; exit status 2" logLayer=csi_frontend requestID=f8b12c63-c85c-4243-b039-ad7f4eb5486c requestSource=CSI
time="2024-09-19T04:06:12Z" level=debug msg="REST API call received." Duration="13.858µs" Method=GET RequestURL=/readiness Route=ReadinessProbe logLayer=rest_frontend requestID=af09f42f-f607-487b-9bf4-4c196611230f requestSource=REST workflow="trident_rest=logger"
time="2024-09-19T04:06:12Z" level=debug msg="REST API call complete." Duration="186.111µs" Method=GET RequestURL=/readiness Route=ReadinessProbe StatusCode=200 logLayer=rest_frontend requestID=af09f42f-f607-487b-9bf4-4c196611230f requestSource=REST workflow="trident_rest=logger"
time="2024-09-19T04:06:12Z" level=debug msg="REST API call received." Duration="7.58µs" Method=GET RequestURL=/liveness Route=LivenessProbe logLayer=rest_frontend requestID=3c48c703-d968-46ae-a7fc-d20d08985530 requestSource=REST workflow="trident_rest=logger"
Hi, michelle when you get this running, would you consider checking in with siderolabs to maybe get this as an official extension and part of the talos factory?
@sempex I highly doubt they will. Their reply will be just add the packages to the container. Which is the right answer and something we could do ourselves if this project source were open. It's based on the open source trident driver so probably should be. I sent an email to qnap requesting the source a few months back but no reply.
@brunnels ahh that sucks, It should not be that hard to get this running but why do you think that they won't add it as an available system extension like linux-utils for example? I mean that would probably help a lot of people...
@michelle-avery Actually, now that I think more about it, we might be able to build a new container based on the old one and specify that in the helm chart or operator. What is the image name of the trident-node-linux
pod? I don't have trident installed in my cluster now or I would look myself.
If that image is defined in the helm values.yaml it should be possible to use it as a base image and install whatever packages are missing.
@michelle-avery one other thing to check is talos default security policy is to not allow privileged pods. Do you have this label on the trident namespace? pod-security.kubernetes.io/enforce: privileged
@brunnels - I did have the namespace labeled correctly. The image, defined here, is qnapsystem/qnap-csi:v1.3.0
. I was assuming the commands were being executed on the host because they needed to be (specifically, because the iscsi drive is mounted directly on the host), but I haven't validated that. If they don't need to be that approach might work. I was also looking at the NetApp trident source, and so far everything I added to the Talos Extension seems to also be visible in that source code, so using that as a proxy for the QNAP source code may help. Unfortunately I had some hardware issues in my homelab that's making this urgent to get my Talos cluster up and running, so I'm using a different storage solution for now. I may be able to get back to trying this in a week or two.
@michelle-avery I don't think they are necessary on the host. I was able to use talos-debug-tools pod on the same node in the qnap trident namespace and run the same dd commands on the iscsi device that the trident-node-linux
was unable to run.
@brunnels @louhisuo @michelle-avery @sempex Currently QNAP is under development the compatible between Talos and QNAP CSI Plug-in. I will keep update such information and please keep looking forward to it
@JimmyTanMPM Any updates? I have time to test this weekend if you have an alpha or similar available.
Having QNAP TS-673A running QuTS hero h5.2.0.2860. QNAP CSI Plugin version is 1.3.0. Kubernetes version is 1.30.3 (Talos Linux).
Trying to initialize qnap csi plugin against following backend
using this trident backend configuration.
When describing
TridentBackendConfig
seeing following errors and also errors in in trident-controller podstorage-api-server
andtrident-main
container logs... and ip address of kubernetes cluster gets added to IP block list.
Note also that Talos Linux has has implemented some kubernetes security hardening by default and I get following type of warnings when deploying plugin as well as when restarting deployments and daemonset
Please advice if this is fault or configuration mistake.