Open evilhamsterman opened 4 months ago
Hey, thanks for the interest! We've been kicking this around for a bit and I filed an internal JIRA to move the identifier to the Kubernetes control-plane instead. I've had some heated conversations with Andrew from the Talos project and I'm not 100% sure moving the identifier to Kubernetes will solve all our problems.
If you are an existing HPE customer or a prospect, you should work with your account team and mention this requirement. That is the fastest route.
I don't think moving the ID to the control plane would solve all the problems, but it's a start. Maybe at least making it possible to set the /etc/hpe-storage mount path so we can specify Talos' ephemeral environment? It's possible with Kustomize but that's an extra step. I do plan on talking with our account rep but wanted to get it on the board here.
Internal JIRA is CON-1838.
Hi, are there any news about support for Talos?
It did not make it into the 2.5.0 release. I was going to do some research on it but it got delayed.
I'm glad to hear that it is actively being pursued at least. I will likely be deploying a new cluster in the relatively near future and it would be nice to be able to start with Talos
I try not to be the one pinging for updates all the time. But I need to start deploying a bare metal Kubernetes cluster soon and I'm in a bit of a planning pickle. I'd really like to just start with Talos but can't because of the need to use Nimble for PVs. I can start with a kubeadm cluster and later migrate to Talos, but that would mean putting a bunch of effort into setting up deployment workflows that may just be abandoned shortly after. So I'm not sure how much effort I should invest in automation vs just rolling by hand for now, or using an alternative storage.
I can understand 2.5 is out of the picture, it looks like there're already betas for that. So is this planned to be included in 2.6, which based on previous release cadence we may see before EOY or perhaps a 2.5.x release? Or is this planned for a longer timeframe like next year. Just trying to get an idea to help with planning.
It's hard for me to gauge when we can get to a stage to support Talos and immutable nodes in general. It's very high on my list but I rarely get my way when large deals are on the table demanding feature X, Y and Z.
Also, full disclosure, we have not even scoped the next minor or patch release as we're neck deep stabilizing 2.5.0. I'll make a note and try to get it in for consideration in the next couple of releases.
If you want to email me directly at michael.mattsson at hpe.com with your company name and business relationship with HPE it will make it easier for me to talk to product management.
I don't have a Talos environment readily available and skimming through the docs I realize I need firewall rules or deploy a new deployment environment for Talos itself.
As a quick hack, can you tell me how far you get with this?
helm repo add datamattsson https://datamattsson.github.io/co-deployments/
helm repo update
helm install my-hpe-csi-driver -nhpe-storage datamattsson/hpe-csi-driver --version 2.5.0-talos --set disableNodeConfiguration=true
It looks like it is still mounting /etc/hpe-storage
and causing failures due to the RO filesystem
Ok, I had a brain fart, try now.
helm uninstall my-hpe-csi-driver -nhpe-storage
helm repo update
helm install my-hpe-csi-driver -nhpe-storage datamattsson/hpe-csi-driver --version 2.5.0-talos2 --set disableNodeConfiguration=true
Getting closer, the controller started fine but the hpe-csi-node daemonset pod is still trying to mount /etc/systemd/system
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m2s default-scheduler Successfully assigned hpe-storage/hpe-csi-node-qv9xk to talos-nvj-4af
Warning Failed 2m2s kubelet Error: failed to generate container "28b5218a6cea8f05806ec4210312762aa45cc1a851befe51d3e231bb6ff95fa2" spec: failed to generate spec: failed to mkdir "/etc/systemd/system": mkdir /etc/systemd: read-only file system
Warning Failed 2m2s kubelet Error: failed to generate container "5f4e20edc65d2a0990d99c0b5da15cf61f3c0273d577f1bacacbbcc49bf77ff5" spec: failed to generate spec: failed to mkdir "/etc/systemd/system": mkdir /etc/systemd: read-only file system
Warning Failed 108s kubelet Error: failed to generate container "1a59382b30bcc28fca08f6b48cf9ccce5adee2d003634ab00a59c9d470ad0a3c" spec: failed to generate spec: failed to mkdir "/etc/systemd/system": mkdir /etc/systemd: read-only file system
Warning Failed 97s kubelet Error: failed to generate container "bdcdb9ac2dac778320a6f1fccfa7e0198ceb9f62cce3ab03ca59b7f061442133" spec: failed to generate spec: failed to mkdir "/etc/systemd/system": mkdir /etc/systemd: read-only file system
Warning Failed 85s kubelet Error: failed to generate container "97701cc024c101137235529d83b03f1461e1dd97e48c543ac5d72474362e739d" spec: failed to generate spec: failed to mkdir "/etc/systemd/system": mkdir /etc/systemd: read-only file system
Warning Failed 74s kubelet Error: failed to generate container "3176d754668a42fc845d93ef4ca8b116bd59f67ec35983626e9901f70099b219" spec: failed to generate spec: failed to mkdir "/etc/systemd/system": mkdir /etc/systemd: read-only file system
Warning Failed 61s kubelet Error: failed to generate container "e0d1cee086f4f574cf0e9eee92da6ba94dbaa359990e92068ec6926dd8e16d03" spec: failed to generate spec: failed to mkdir "/etc/systemd/system": mkdir /etc/systemd: read-only file system
Warning Failed 46s kubelet Error: failed to generate container "f6ab37bf1edc712984ff69f9f5da848a5eb6e4cf1bec0efa8cc697cc4f776e8b" spec: failed to generate spec: failed to mkdir "/etc/systemd/system": mkdir /etc/systemd: read-only file system
Warning Failed 34s kubelet Error: failed to generate container "c641f39c98980fccbac986b9c4bf7d35b2b226fc70fc12e71c54dc50b672bd77" spec: failed to generate spec: failed to mkdir "/etc/systemd/system": mkdir /etc/systemd: read-only file system
Normal Pulled 7s (x11 over 2m2s) kubelet Container image "quay.io/hpestorage/csi-driver:v2.5.0-beta" already present on machine
Warning Failed 7s (x2 over 21s) kubelet (combined from similar events): Error: failed to generate container "6d45577bdd7ca1971a3eba9b3c110ea41001ed5d08cdfb91792fac458da31a37" spec: failed to generate spec: failed to mkdir "/etc/systemd/system": mkdir /etc/systemd: read-only file system
I did ensure disableNodeConfiguration
is set
❯ helm get values my-hpe-csi-driver
USER-SUPPLIED VALUES:
disableNodeConfiguration: true
Ok, I here's the next one. 2.5.0-talos3
helm uninstall my-hpe-csi-driver -nhpe-storage
helm repo update
helm install my-hpe-csi-driver -nhpe-storage datamattsson/hpe-csi-driver --version 2.5.0-talos3 --set disableNodeConfiguration=true
The pod starts but the initContainer immediately crashes
hpe-csi-node-init + '[' --endpoint=unix:///csi/csi.sock = --node-init ']'
hpe-csi-node-init + for arg in "$@"
hpe-csi-node-init + '[' --flavor=kubernetes = --node-service ']'
hpe-csi-node-init + '[' --flavor=kubernetes = --node-init ']'
hpe-csi-node-init + disableNodeConformance=
hpe-csi-node-init + disableNodeConfiguration=
hpe-csi-node-init + '[' true = true ']'
hpe-csi-node-init + '[' '' = true ']'
hpe-csi-node-init + '[' '' = true ']'
hpe-csi-node-init + '[' '' '!=' true ']'
hpe-csi-node-init + cp -f /opt/hpe-storage/lib/hpe-storage-node.service /etc/systemd/system/hpe-storage-node.service
hpe-csi-node-init + cp -f /opt/hpe-storage/lib/hpe-storage-node.sh /etc/hpe-storage/hpe-storage-node.sh
hpe-csi-node-init cp: cannot create regular file '/etc/hpe-storage/hpe-storage-node.sh': No such file or directory
I looks like the DISABLE_NODE_CONFIGURATION
environment variable is not getting set on the initContainer
spec:
initContainers:
- args:
- --node-init
- --endpoint=$(CSI_ENDPOINT)
- --flavor=kubernetes
env:
- name: CSI_ENDPOINT
value: unix:///csi/csi.sock
image: quay.io/hpestorage/csi-driver:v2.5.0-beta
imagePullPolicy: IfNotPresent
name: hpe-csi-node-init
resources:
limits:
cpu: "2"
memory: 1Gi
requests:
cpu: 100m
memory: 128Mi
securityContext:
allowPrivilegeEscalation: true
capabilities:
add:
- SYS_ADMIN
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /host
mountPropagation: Bidirectional
name: root-dir
- mountPath: /dev
name: device-dir
- mountPath: /sys
name: sys
- mountPath: /run/systemd
name: runsystemd
- mountPath: /csi
name: plugin-dir
- mountPath: /var/lib/kubelet
name: pods-mount-dir
- mountPath: /var/log
name: log-dir
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-xsr7w
readOnly: true
This is very interesting, I think you just uncovered a different bug altogether. =)
Ok, talos4 has been published.
helm uninstall my-hpe-csi-driver -nhpe-storage
helm repo update
helm install my-hpe-csi-driver -nhpe-storage datamattsson/hpe-csi-driver --version 2.5.0-talos4 --set disableNodeConfiguration=true
I edited the DS to add the environment variable and used your latest update. The initContainer succeeds now but then I think we get to meat of the situation the csi-node-driver-registrar starts crashing and the hpe-csi-driver container complains it can't find initiators. It looks like part of the problem is on the Talos Side their iscsi-tools extension doesn't appear to include the multipath command https://github.com/siderolabs/extensions/issues/134. Though democratic-csi claims that it's not needed I'm not an expert in iSCSI so I can't say how true https://github.com/democratic-csi/democratic-csi/pull/225#issuecomment-1478699681
Not sure how much help it is, but looking at your code it looks like perhaps the main issue is you're looking for the /etc/iscsi/initiatorname.iscsi
file but that file doesn't exist in the normal placed in their system. Their extension bind mounts /usr/local/etc/iscsi/iscsid.con
into the extension container at /etc/iscsi/iscsid.conf
https://github.com/siderolabs/extensions/blob/f0b6082466dc78a309d1e9a7d8525497d714d4d4/storage/iscsi-tools/iscsid.yaml#L52C5-L53C42 but it doesn't mount the rest of the iSCSI folder so the initiator name is not accessible to you.
Looks to me like they need to mount the full /usr/local/etc/iscsi
directory so that your driver can access that file, I assume that's how you get the imitator to register with the storage.
EUREKA! I found it they do mount the /etc/iscsi
directory into /system/iscsi
on the host, I shelled into the hpe-csi-node/hpe-csi-driver container and changed the link from /etc/iscsi -> /host/etc/iscsi
to /host/system/iscsi
when the registrar next restarted the driver container was able to find the initiator name and everything is now running.
❯ k get pods
NAME READY STATUS RESTARTS AGE
hpe-csi-controller-8447c48d9f-rjd49 9/9 Running 0 22m
hpe-csi-node-5t69x 2/2 Running 9 (5m45s ago) 22m
nimble-csp-74776998b6-fmcn2 1/1 Running 0 22m
primera3par-csp-58dd48cccb-lvvjb 1/1 Running 0 22m
obviously that will break when that pod restarts. But I then created a storage class and a PVC and it worked right away
❯ k get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
my-first-pvc Bound pvc-dc881628-ffe4-42c9-951e-e266502dd226 32Gi RWO csq-it-nimble1 <unset> 2m5s
and I can see the volume on the array.
The last step mounting it is the one remaining issue. It does not successfully mount the volume. That appears to actually be related to the multipath I mentioned. I'm signing off for the weekend I'll look more on Monday
I should've researched this but is /usr/local/etc
writable in Talos? (or, what directory IS writable and/or persistent on Talos?) I'm thinking we could just add a Helm chart parameter for CSI driver users to relocate /etc
to whatever directory on the node.
As for commands the CSI driver needs to have availabe, look for clues here: https://github.com/hpe-storage/csi-driver/blob/master/Dockerfile
As for the multipath issue is that the HPE CSI Driver require multipath/multipathd on the host, there's no workaround as we don't even consider non-multipath entries.
I'm out of pocket of the rest of the weekend as well, cheers!
I've exhausted the time I can work on this for now. But this is what I found messing around some more. Hopefully it can help you get on the correct path, but it certainly looks like it's going to require more work than just changing the mount location. It does give me a little better idea on my planning though, I'll probably need to plan on a longer time for support.
It looks like /system
is supposed to be the location for "persistent" data but it appears they mean persistent for the extension container lifecycle. The data is persistent if you restart the extension but it is not persistent over reboots. The path /system/state
which contains the node config is persistent, and /var
which is used as the storage for container images is persistent across reboots but I believe is not guaranteed.
However because the extensions are not persistent across reboots things like the Initiator name are not consistent, a new one is generated on every boot. Because of this I don't think it's a good idea to try and persist your node ID on disk like we discussed earlier. Either that should be generated dynamically or use the Kubernetes Node ID and store extra persistent data in a configmap or crd. In my opinion this is more in line with the general idea of Kubernetes anyway and cattle vs pets workflows.
Overall I see two maybe three major problems. One will require changes from Talos, the other will require work on your driver
Because the iscsi-tools runs as an OS level container it also has a very limited subset of tools. I was able to get iscsiadm
to work by changing the chroot script to use nsenter
instead, though maybe it would work without using env
.
#!/bin/bash
iscsi_pid=$(pgrep -f "iscsid -f")
nsenter --mount="/proc/$iscsi_pid/ns/mnt" --net="/proc/$iscsi_pid/ns/net" -- /usr/local/sbin/iscsiadm "${@:1}"
/ # find /host -name "*bin" -type d 2>/dev/null | grep -v var | grep -v container | xargs ls
/host/bin:
containerd containerd-shim-runc-v2
containerd-shim runc
/host/opt/cni/bin: bandwidth firewall ipvlan ptp tuning bridge flannel loopback sbr vlan dhcp host-device macvlan static vrf dummy host-local portmap tap
/host/sbin: blkdeactivate lvm udevadm dashboard lvm_import_vdo udevd dmsetup lvmconfig vgcfgbackup dmstats lvmdevices vgcfgrestore dmstats.static lvmdiskscan vgchange fsadm lvmdump vgck fsck.xfs lvmsadc vgconvert init lvmsar vgcreate ip6tables lvreduce vgdisplay ip6tables-apply lvremove vgexport ip6tables-legacy lvrename vgextend ip6tables-legacy-restore lvresize vgimport ip6tables-legacy-save lvs vgimportclone ip6tables-restore lvscan vgimportdevices ip6tables-save mkfs.xfs vgmerge iptables modprobe vgmknodes iptables-apply poweroff vgreduce iptables-legacy pvchange vgremove iptables-legacy-restore pvck vgrename iptables-legacy-save pvcreate vgs iptables-restore pvdisplay vgscan iptables-save pvmove vgsplit lvchange pvremove wrapperd lvconvert pvresize xfs_repair lvcreate pvs xtables-legacy-multi lvdisplay pvscan lvextend shutdown
/host/usr/bin: udevadm
/host/usr/local/bin:
/host/usr/local/sbin: brcm_iscsiuio iscsi_offload iscsiuio iscsi-gen-initiatorname iscsiadm tgtadm iscsi-iname iscsid tgtd iscsi_discovery iscsid-wrapper tgtimg iscsi_fw_login iscsistart
/host/usr/sbin: cryptsetup mkfs.fat xfs_freeze xfs_ncheck dosfsck mkfs.msdos xfs_fsr xfs_quota dosfslabel mkfs.vfat xfs_growfs xfs_rtcp fatlabel veritysetup xfs_info xfs_scrub fsck.fat xfs_admin xfs_io xfs_scrub_all fsck.msdos xfs_bmap xfs_logprint xfs_spaceman fsck.vfat xfs_copy xfs_mdrestore integritysetup xfs_db xfs_metadump mkdosfs xfs_estimate xfs_mkfile
Thanks for the additional context. This definitely needs more work. I'm just puzzled how we can't even persist an IQN on the host though? Do we need to grab the first boot one and store in our CRD and regenerate the host IQN from that?
I guess FC wouldn't have as many problems but we still would need multipath/multipathd regardless. Not having ext4 available will also create problems for our NFS server implementation for RWX claims that doesn't play nicely with XFS in failure scenarios.
Thanks for the additional context. This definitely needs more work. I'm just puzzled how we can't even persist an IQN on the host though? Do we need to grab the first boot one and store in our CRD and regenerate the host IQN from that? It doesn't look like you can manage the IQN, their service generates one itself.
Just my thoughts I can think of two ways to deal with it
I guess FC wouldn't have as many problems but we still would need multipath/multipathd regardless. Not having ext4 available will also create problems for our NFS server implementation for RWX claims that doesn't play nicely with XFS in failure scenarios.
Looking around at other CSI iSCSI implementations it looks like many of them use their own mkfs
and mount
binaries rather than rely on the host
Talos is becoming more popular but currently the csi-driver doesn't work with it. If we need to do manual configuration of things like the iscsi and multipath we can do that by pushing values/files in the machine config. But the biggest hitch to me appears to be the requirement to create and mount
/etc/hpe-storage
on the host. That works on CoreOS but does not on Talos because basically the whole system is RO.From what I can see that mount is needed to store a unique id for the node, couldn't you use the already existing unique ide and store specific data in ConfigMaps.