hpe-storage / truenas-csp

TrueNAS Container Storage Provider for HPE CSI Driver for Kubernetes
https://scod.hpedev.io
MIT License
65 stars 8 forks source link

Unable to connect to csi.sock #56

Open jlpedrosa opened 5 months ago

jlpedrosa commented 5 months ago

I'm unable to boot correctly the CSI driver. More precisely on the HPE part.

The daemon set shows on the csi-node-driver-registrar container:

W0331 13:44:10.456786       1 connection.go:183] Still connecting to unix:///csi/csi.sock
W0331 13:44:20.456542       1 connection.go:183] Still connecting to unix:///csi/csi.sock
W0331 13:44:30.461561       1 connection.go:183] Still connecting to unix:///csi/csi.sock
W0331 13:44:40.456518       1 connection.go:183] Still connecting to unix:///csi/csi.sock
...

On the other container I see a lot of errors, that I don't know if they are fatal (last one feels like?):

+ echo 'starting csi plugin...'
+ exec /bin/csi-driver --endpoint=unix:///csi/csi.sock --node-service --flavor=kubernetes
starting csi plugin...
time="2024-03-31T13:43:11Z" level=info msg="Initialized logging." alsoLogToStderr=true logFileLocation=/var/log/hpe-csi-node.log logLevel=info
time="2024-03-31T13:43:11Z" level=info msg="**********************************************" file="csi-driver.go:54"
time="2024-03-31T13:43:11Z" level=info msg="*************** HPE CSI DRIVER ***************" file="csi-driver.go:55"
time="2024-03-31T13:43:11Z" level=info msg="**********************************************" file="csi-driver.go:56"
time="2024-03-31T13:43:11Z" level=info msg=">>>>> CMDLINE Exec, args: []" file="csi-driver.go:58"
time="2024-03-31T13:43:11Z" level=info msg="got OS details as [redhat 9 2 5.15.0-1049-raspi]\n" file="os.go:95"
time="2024-03-31T13:43:16Z" level=warning msg="Distro section: Ubuntu , not present for deviceType: Nimble , using default config" file="config.go:247"
time="2024-03-31T13:43:16Z" level=info msg="No further iSCSI recommendations are found for this host" file="iscsi.go:399"
time="2024-03-31T13:43:16Z" level=error msg="open /sys/firmware/dmi/tables/DMI: no such file or directory" file="file.go:117"
time="2024-03-31T13:43:16Z" level=error msg="unable to get system information using sysfs as well open /sys/firmware/dmi/tables/DMI: no such file or directory" file="system.go:108"
time="2024-03-31T13:43:16Z" level=error msg="unable to determine if system is running as a virtual machine cannot determine if system is of type virtual machine, parameter not found in system information string: manufacturer" file="multipath.go:182"
time="2024-03-31T13:43:16Z" level=error msg="unable to determine if multipath is required cannot determine if system is of type virtual machine, parameter not found in system information string: manufacturer" file="multipath.go:202"
time="2024-03-31T13:43:16Z" level=error msg="Failed to execute CLI handler, Err: Unable to configure multipathd service, err cannot determine if system is of type virtual machine, parameter not found in system information string: manufacturer" file="csi-driver.go:62"

I see multiple things that look wrong/ The distro is actually Ubuntu, but the log says: "got OS details as [redhat 9 2 5.15.0-1049-raspi]\n but later we get Distro section: Ubuntu , not present for deviceType: Nimble , using default config" file="config.go:247"

Indeed the path /sys/firmware/dmi/ does not exist

The "server" is a raspberry pi, also BOOTED through ISCSI. I hope we can fix this before you actually release 2.4.1?

Also heads UP, i modified the kubelet path, the one created by default was incorrect, contained double slash //

This is the deployment:

  values:
    hpe-csi-driver:
      kubeletRootDir: "/var/lib/kubelet"
    service:
      type: LoadBalancer
      port: 8080
    ingress:
      enabled: true
      className: contour
      annotations:
        cert-manager.io/cluster-issuer: letsencrypt-prod
        kubernetes.io/tls-acme: "true"
        kubernetes.io/ingress.class: contour
      hosts:
        - host: mydomain
          paths:
            - path: /
              pathType: ImplementationSpecific
      tls:
      - secretName: truenas-csp-tls
        hosts:
        - mydomain
datamattsson commented 5 months ago

Hi! Thanks for filing this issue. I'm digging around and finding a few skeletons.

The workaround you need to apply here is to install the Helm chart and disable node configuration and conformance.

To apply hpe-csi-driver parameters to the TrueNAS CSP chart you prefix the values with hpe-csi-driver, like --set hpe-csi-driver.disableNodeConformance --set hpe-csi-driver.disableNodeConfiguration. Somehow I've missed to document this and I'm not in front of a computer to verify this.

There won't be a 2.4.1 of the TrueNAS CSP, we're working on 2.4.2 of the CSI driver as we uncovered an issue with 3PAR that needs immediate attention.

It won't include any fix for this issue so the workaround will be to manually install and configure iSCSI/multipath and disable node configuration/conformance.

jlpedrosa commented 5 months ago

Hi @datamattsson

Thanks for the help, now the errors are gone. Let me know if you need me to run any tests to solve it in a more permanent way or if the recommended way is to disable those, then probably there should be docs about the packages required?

Thanks again!

datamattsson commented 5 months ago

Let me know if you need me to run any tests to solve it in a more permanent way or if the recommended way is to disable those, then probably there should be docs about the packages required?

Here's the docs for the "disable" parameters: https://scod.hpedev.io/csi_driver/operations.html#manual_node_configuration

The permanent fix for this issue is that the error that is raised shouldn't be fatal. I'll file an internal JIRA for this.