kubernetes-sigs / image-builder

Tools for building Kubernetes disk images
https://image-builder.sigs.k8s.io/
Apache License 2.0
402 stars 394 forks source link

Azure windows-2025 builds failing: timeout during goss tests #1603

Closed mboersma closed 1 month ago

mboersma commented 1 month ago

The pull-azure-sigs job has failed often since #1527 was merged, due to a timeout during the goss test stage:

    azure-arm.sig-windows-2025-containerd: }
    azure-arm.sig-windows-2025-containerd: Error: timeout of 3m0s reached before tests entered a passing state
==> azure-arm.sig-windows-2025-containerd: Provisioning step had errors: Running the cleanup provisioner, if present...
==> azure-arm.sig-windows-2025-containerd: 

This is blocking CI and keeping PRs from merging, so we may want to skip this test for now unless a solution is iminent.

Environment

/kind bug

mboersma commented 1 month ago

The specific goss failures are these:

azure-arm.sig-windows-2025-containerd:         {
azure-arm.sig-windows-2025-containerd:             "duration": 30009575200,
azure-arm.sig-windows-2025-containerd:             "err": {},
azure-arm.sig-windows-2025-containerd:             "expected": null,
azure-arm.sig-windows-2025-containerd:             "found": null,
azure-arm.sig-windows-2025-containerd:             "human": "",
azure-arm.sig-windows-2025-containerd:             "meta": null,
azure-arm.sig-windows-2025-containerd:             "property": "exit-status",
azure-arm.sig-windows-2025-containerd:             "resource-id": "Windows Feature - Hyper-V-PowerShell",
azure-arm.sig-windows-2025-containerd:             "resource-type": "Command",
azure-arm.sig-windows-2025-containerd:             "result": 1,
azure-arm.sig-windows-2025-containerd:             "successful": false,
azure-arm.sig-windows-2025-containerd:             "summary-line": "Windows Feature - Hyper-V-PowerShell: exit-status: Error: Command execution timed out (30s)",
azure-arm.sig-windows-2025-containerd:             "test-type": 1,
azure-arm.sig-windows-2025-containerd:             "title": ""
azure-arm.sig-windows-2025-containerd:         },
azure-arm.sig-windows-2025-containerd:         {
azure-arm.sig-windows-2025-containerd:             "duration": 0,
azure-arm.sig-windows-2025-containerd:             "err": {},
azure-arm.sig-windows-2025-containerd:             "expected": null,
azure-arm.sig-windows-2025-containerd:             "found": null,
azure-arm.sig-windows-2025-containerd:             "human": "",
azure-arm.sig-windows-2025-containerd:             "meta": null,
azure-arm.sig-windows-2025-containerd:             "property": "stdout",
azure-arm.sig-windows-2025-containerd:             "resource-id": "Windows Feature - Hyper-V-PowerShell",
azure-arm.sig-windows-2025-containerd:             "resource-type": "Command",
azure-arm.sig-windows-2025-containerd:             "result": 1,
azure-arm.sig-windows-2025-containerd:             "successful": false,
azure-arm.sig-windows-2025-containerd:             "summary-line": "Windows Feature - Hyper-V-PowerShell: stdout: Error: Command execution timed out (30s)",
azure-arm.sig-windows-2025-containerd:             "test-type": 2,
azure-arm.sig-windows-2025-containerd:             "title": ""
azure-arm.sig-windows-2025-containerd:         }

And the same log failures show up for Windows feature Containers. But the "Add required Windows Features" task completed without an error, it's just that goss times out trying to validate the features are enabled.

mboersma commented 1 month ago

I doubled the wait timeout to 60000, but got the same failure locally.

https://github.com/kubernetes-sigs/image-builder/blob/95d148386f0eb49c98a067c707aa888c8994dee2/images/capi/packer/goss/goss-package.yaml#L77-L93

Also we increased the overall job timeout in kubernetes/test-infra#33668, but no luck.

TinaMor commented 1 month ago

@mboersma This is a WS2025 known issue. It works after a retry.