ovn-org / libovsdb

An OVSDB Client Library written in Golang
Apache License 2.0
183 stars 153 forks source link

bug: error when trying to initialize libovsdb NB client: no space left on device #372

Open Dr0p42 opened 10 months ago

Dr0p42 commented 10 months ago

Hello, while upgrading an OKD cluster from 4.11.0-0.okd-2022-10-28-153352 to 4.11.0-0.okd-2022-12-02-145640 I got the following error:

F1124 20:44:04.871535       1 ovnkube.go:133] error when trying to initialize libovsdb NB client: no space left on device

This occurred while there was still a lot of space on the master node. After a lot of testing, I actually tried to check if it was not a storage issue but a port binding issue. And I saw that there was an ovnkube-node already running on the same machine. So I tried to:

And the master was able to finally boot and go over that error.

I don't know if there is something doable to update the error message error when trying to initialize libovsdb NB client: no space left on device which I find misleading.

I think the log message is being displayed from this line: https://github.com/ovn-org/ovn-kubernetes/blob/ac6820df0b338a246f10f412cd5ec903bd234694/go-controller/cmd/ovnkube/ovnkube.go#L486

But I see that the code is just printing the error as is. So I guess if something can be done it might be in this repo this is why I am opening it here.

I can provide more logs if needed. Best, Maxime

halfcrazy commented 10 months ago

This occurred while there was still a lot of space on the master node.

Have you checked inodes? Is pod mount the hostpath or using a pv?

Dr0p42 commented 10 months ago

This occurred while there was still a lot of space on the master node.

Have you checked inodes? Is pod mount the hostpath or using a pv?

Hello @halfcrazy, the pod is mounting volumes using hostPath not a pv.

Those are the volumes and volumeMounts:

volumeMounts:

      volumeMounts:
      # hostPath
        - name: systemd-units
          readOnly: true
          mountPath: /etc/systemd/system
        - name: etc-openvswitch
          mountPath: /etc/openvswitch/
        - name: etc-openvswitch
          mountPath: /etc/ovn/
        - name: var-lib-openvswitch
          mountPath: /var/lib/openvswitch/
        - name: run-openvswitch
          mountPath: /run/openvswitch/
        - name: run-ovn
          mountPath: /run/ovn/
        - name: ovnkube-config
          mountPath: /run/ovnkube-config/
        - name: env-overrides
          mountPath: /env
        - name: ovn-cert
          mountPath: /ovn-cert
        - name: ovn-ca
          mountPath: /ovn-ca
        - name: kube-api-access-qgltv
          readOnly: true
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount

volumes:

  volumes:
    - name: systemd-units
      hostPath:
        path: /etc/systemd/system
        type: ''
    - name: etc-openvswitch
      hostPath:
        path: /var/lib/ovn/etc
        type: ''
    - name: var-lib-openvswitch
      hostPath:
        path: /var/lib/ovn/data
        type: ''
    - name: run-openvswitch
      hostPath:
        path: /var/run/openvswitch
        type: ''
    - name: run-ovn
      hostPath:
        path: /var/run/ovn
        type: ''
    - name: ovnkube-config
      configMap:
        name: ovnkube-config
        defaultMode: 420
    - name: env-overrides
      configMap:
        name: env-overrides
        defaultMode: 420
        optional: true
    - name: ovn-ca
      configMap:
        name: ovn-ca
        defaultMode: 420
    - name: ovn-cert
      secret:
        secretName: ovn-cert
        defaultMode: 420
    - name: ovn-master-metrics-cert
      secret:
        secretName: ovn-master-metrics-cert
        defaultMode: 420
        optional: true
    - name: kube-api-access-qgltv
      projected:
        sources:
          - serviceAccountToken:
              expirationSeconds: 3607
              path: token
          - configMap:
              name: kube-root-ca.crt
              items:
                - key: ca.crt
                  path: ca.crt
          - downwardAPI:
              items:
                - path: namespace
                  fieldRef:
                    apiVersion: v1
                    fieldPath: metadata.namespace
          - configMap:
              name: openshift-service-ca.crt
              items:
                - key: service-ca.crt
                  path: service-ca.crt
        defaultMode: 420

I also checked crio.conf and /etc/containers/storage.conf and there was nothing very interesting, I mainly wanted to check for a storage limit in the overlayfs but there was nothing interesting.


Regarding the ports it does not seems to make sens either as there are no conflicting ports between ovnkube-master and ovnkube-node. I can share those yaml if you want to.


Would it be possible that the ovnkube-node was using interacting with a file in the hostPath that ovnkube-master is also using? Therefore when I killed ovnkube-master then ovnkube-node that file just got release or something?

I am sorry I really don't know this project well. Let me know if I can share something to you that could help you understand all of this better.