harvester / harvester

Open source hyperconverged infrastructure (HCI) software
https://harvesterhci.io/
Apache License 2.0
3.81k stars 320 forks source link

[BUG] Longhorn V2 disks are added successfully, but show webhook errors in the UI, and in BD status conditions #6709

Open tserong opened 2 weeks ago

tserong commented 2 weeks ago

Describe the bug When adding an additional disk to a node with provisioner set to Longhorn V2, the disk is added successfully, but later shows admission webhook errors, for example:

image

After rebooting the host, we see a different error:

image

...and the corresponding block device status shows the AddedToNode condition is false, even though the disk is provisioned successfully:

# kubectl get bds -n longhorn-system -o yaml
[...]
  spec:
    [...]
    provision: true
    provisioner:
      longhorn:
        engineVersion: LonghornV2
  status:
    conditions:
    - lastUpdateTime: "2024-10-04T07:36:57Z"
      message: 'Internal error occurred: failed calling webhook "mutator.longhorn.io":
        failed to call webhook: Post "https://longhorn-admission-webhook.longhorn-system.svc:9502/v1/webhook/mutation?timeout=10s":
        proxy error from 127.0.0.1:9345 while dialing 10.52.0.174:9502, code 502:
        502 Bad Gateway'
      reason: Error
      status: "False"
      type: AddedToNode
    [...]
    provisionPhase: Provisioned
    state: Active

To Reproduce Steps to reproduce the behavior:

  1. Add a new disk to a host using the Longhorn V2 provisioner.
  2. Wait and observe the errors described above.

Expected behavior The disk is added successfully and we don't see any errors.

Environment

Additional context I thought I'd fixed this with https://github.com/harvester/node-disk-manager/pull/142, but apparently I didn't :-/

harvesterhci-io-github-bot commented 1 week ago

Pre Ready-For-Testing Checklist

harvesterhci-io-github-bot commented 1 week ago

Automation e2e test issue: harvester/tests#1587