kubernetes / kubernetes

Production-Grade Container Scheduling and Management
https://kubernetes.io
Apache License 2.0
108.75k stars 38.99k forks source link

[Flaking Test] capz-windows-master #124146

Closed pacoxu closed 3 months ago

pacoxu commented 3 months ago

Which jobs are failing?

https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-capz-master-windows/1774979825824436224

Which tests are failing?

Apr  2 02:15:00.273: INFO: Dumping workload cluster default/capz-conf-d44i8z nodes
panic: Timed out after 180.001s.
Failed to get default/capz-conf-d44i8z-kubeconfig
Expected success, but got an error:
    <*errors.StatusError | 0xc001cdeaa0>: 
    secrets "capz-conf-d44i8z-kubeconfig" not found
    {
        ErrStatus: {
            TypeMeta: {Kind: "", APIVersion: ""},
            ListMeta: {
                SelfLink: "",
                ResourceVersion: "",
                Continue: "",
                RemainingItemCount: nil,
            },
            Status: "Failure",
            Message: "secrets \"capz-conf-d44i8z-kubeconfig\" not found",
            Reason: "NotFound",
            Details: {
                Name: "capz-conf-d44i8z-kubeconfig",
                Group: "",
                Kind: "secrets",
                UID: "",
                Causes: nil,
                RetryAfterSeconds: 0,
            },
            Code: 404,
        },
    }
goroutine 1 [running]:
main.Fail({0xc0002aee00?, 0x12?}, {0xc001a8d450?, 0x3?, 0x3?})
    /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/test/logger.go:38 +0x27
github.com/onsi/gomega/internal.(*AsyncAssertion).match.func3({0x3c[276](https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-capz-master-windows/1774979825824436224#1:build-log.txt%3A276)8c, 0x9})
    /home/prow/go/pkg/mod/github.com/onsi/gomega@v1.30.0/internal/async_assertion.go:478 +0x1d9
github.com/onsi/gomega/internal.(*AsyncAssertion).match(0xc0002c35e0, {0x4048870, 0x5954d80}, 0x1, {0xc0008ebba0, 0x2, 0x2})
    /home/prow/go/pkg/mod/github.com/onsi/gomega@v1.30.0/internal/async_assertion.go:560 +0xd45
github.com/onsi/gomega/internal.(*AsyncAssertion).Should(0xc0002c35e0, {0x4048870, 0x5954d80}, {0xc0008ebba0, 0x2, 0x2})
    /home/prow/go/pkg/mod/github.com/onsi/gomega@v1.30.0/internal/async_assertion.go:145 +0x86
sigs.k8s.io/cluster-api/test/framework.(*clusterProxy).getKubeconfig(0xc001198540?, {0x405b6e0, 0x5954d80}, {0x7ffc1207de10, 0x7}, {0x7ffc1207ddf3, 0x10})
    /home/prow/go/pkg/mod/sigs.k8s.io/cluster-api/test@v1.6.3/framework/cluster_proxy.go:394 +0x2bd
sigs.k8s.io/cluster-api/test/framework.(*clusterProxy).GetWorkloadCluster(0xc0006be720, {0x405b6e0, 0x5954d80}, {0x7ffc1207de10, 0x7}, {0x7ffc1207ddf3, 0x10}, {0x0, 0x0, 0x0})
    /home/prow/go/pkg/mod/sigs.k8s.io/cluster-api/test@v1.6.3/framework/cluster_proxy.go:[288](https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-capz-master-windows/1774979825824436224#1:build-log.txt%3A288) +0x205
sigs.k8s.io/cluster-api-provider-azure/test/e2e.(*AzureClusterProxy).collectNodes(0x3c1e7ee?, {0x405b6e0, 0x5954d80}, {0x7ffc1207de10?, 0xc001e87d30?}, {0x7ffc1207ddf3?, 0x2?}, {0xc000435140, 0x29})
    /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/test/e2e/azure_clusterproxy.go:182 +0x68
sigs.k8s.io/cluster-api-provider-azure/test/e2e.(*AzureClusterProxy).CollectWorkloadClusterLogs(0xc00007be00, {0x405b6e0, 0x5954d80}, {0x7ffc1207de10, 0x7}, {0x7ffc1207ddf3, 0x10}, {0xc000435140, 0x29})
    /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/test/e2e/azure_clusterproxy.go:84 +0x28a
main.main()
    /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/test/logger.go:78 +0x65d
exit status 2
/home/prow/go/src/sigs.k8s.io/windows-testing
================ REDACTING LOGS ================
All sensitive variables are redacted
Tue, 02 Apr 2024 02:18:01 +0000: deleting cluster
+ EXIT_VALUE=3
+ set +o xtrace
Cleaning up after docker in docker.

Since when has it been failing?

today

Testgrid link

https://testgrid.k8s.io/sig-release-master-informing#capz-windows-master

Reason for failure (if possible)

See above

Anything else we need to know?

azure capz has some updates today.

https://github.com/kubernetes-sigs/cloud-provider-azure/compare/15fadde40d38b5fe45ddd5aa62f20b5aa2abbeca...1ccb37b582b45de0a8a3e9b89d9abb23623edd79

Relevant SIG(s)

/sig windows cluster-lifecycle /area provider/azure

k8s-ci-robot commented 3 months ago

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
pacoxu commented 3 months ago

https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-capz-master-windows/1775025123871428608 last run passed /close

k8s-ci-robot commented 3 months ago

@pacoxu: Closing this issue.

In response to [this](https://github.com/kubernetes/kubernetes/issues/124146#issuecomment-2031156917): >https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-capz-master-windows/1775025123871428608 >last run passed >/close > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
pacoxu commented 3 months ago

@MartinForReal I saw your update in https://github.com/kubernetes-sigs/cloud-provider-azure/compare/1ccb37b582b45de0a8a3e9b89d9abb23623edd79...356139ec1b7a030634ec0f25dd48bfd6a79941ea.

Is that related?

pacoxu commented 3 months ago

/reopen

k8s-ci-robot commented 3 months ago

@pacoxu: Reopened this issue.

In response to [this](https://github.com/kubernetes/kubernetes/issues/124146#issuecomment-2032264114): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
jsturtevant commented 3 months ago

The last failure looks to be related to a tool used to configure the tests:

installing Azure CLI
Get:1 http://deb.debian.org/debian bookworm InRelease [151 kB]
Get:2 https://download.docker.com/linux/debian bookworm InRelease [43.3 kB]
Get:3 http://deb.debian.org/debian bookworm-updates InRelease [55.4 kB]
Get:4 http://deb.debian.org/debian-security bookworm-security InRelease [48.0 kB]
Get:5 http://deb.debian.org/debian bookworm/main amd64 Packages [8786 kB]
Get:6 https://download.docker.com/linux/debian bookworm/stable amd64 Packages [19.7 kB]
Get:7 http://deb.debian.org/debian bookworm-updates/main amd64 Packages [12.7 kB]
Get:8 http://deb.debian.org/debian-security bookworm-security/main amd64 Packages [150 kB]
Fetched 9266 kB in 2s (4805 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
ca-certificates is already the newest version (202303[11](https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-capz-master-windows/1775115721861238784#1:build-log.txt%3A11)).
curl is already the newest version (7.88.1-10+deb12u5).
apt-transport-https is already the newest version (2.6.1).
lsb-release is already the newest version (12.0-1).
gnupg is already the newest version (2.2.40-1.1).
gnupg set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 13 not upgraded.
deb [arch=amd64] https://packages.microsoft.com/repos/azure-cli/ bookworm main
Hit:1 https://download.docker.com/linux/debian bookworm InRelease
Hit:2 http://deb.debian.org/debian bookworm InRelease
Hit:3 http://deb.debian.org/debian bookworm-updates InRelease
Get:4 https://packages.microsoft.com/repos/azure-cli bookworm InRelease [3575 B]
Hit:5 http://deb.debian.org/debian-security bookworm-security InRelease
Get:6 https://packages.microsoft.com/repos/azure-cli bookworm/main amd64 Packages [1100 B]
Err:6 https://packages.microsoft.com/repos/azure-cli bookworm/main amd64 Packages
  File has unexpected size (1161 != 1100). Mirror sync in progress? [IP: 13.107.246.38 443]
  Hashes of expected file:
   - Filesize:1100 [weak]
   - SHA5[12](https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-capz-master-windows/1775115721861238784#1:build-log.txt%3A12):dfe4412ef50567887ae10f815b017c95f98eef516bab53e1fd392e9012f891857da597253466e33fc1786a0f707ba75612c5530936e6bf30fd4ded9a8b567551
   - SHA256:dbdc0cb77a09e8835195f500ef6c61545209c6ec68cdb8333a3b2a4f7c5f6699
  Release file created at: Tue, 05 Mar 2024 03:15:40 +0000
Fetched 3575 B in 1s (6064 B/s)
Reading package lists...
E: Failed to fetch https://packages.microsoft.com/repos/azure-cli/dists/bookworm/main/binary-amd64/Packages.gz  File has unexpected size (1161 != 1100). Mirror sync in progress? [IP: [13](https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-capz-master-windows/1775115721861238784#1:build-log.txt%3A13).107.246.38 443]
   Hashes of expected file:
    - Filesize:1100 [weak]
    - SHA512:dfe4412ef50567887ae10f8[15](https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-capz-master-windows/1775115721861238784#1:build-log.txt%3A15)b017c95f98eef516bab53e1fd392e9012f891857da597253466e33fc1786a0f707ba75612c5530936e6bf30fd4ded9a8b567551
    - SHA256:dbdc0cb77a09e8835195f500ef6c61545209c6ec68cdb8333a3b2a4f7c5f6699
   Release file created at: Tue, 05 Mar 2024 03:15:40 +0000
E: Some index files failed to download. They have been ignored, or old ones used instead.
/home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/hack/ensure-azcli.sh: line 28: az: command not found
marosset commented 3 months ago

It looks like this error went away on it own (we've had several successful runs without seeing it). @jsturtevant should we monitor for a little while longer?

pacoxu commented 3 months ago

/kind flake

pacoxu commented 3 months ago

keeps flaking in https://testgrid.k8s.io/sig-release-master-informing#capz-windows-master

pacoxu commented 3 months ago

/close as the root cause of the flaking is totally different from the issue description now.

k8s-ci-robot commented 3 months ago

@pacoxu: Closing this issue.

In response to [this](https://github.com/kubernetes/kubernetes/issues/124146#issuecomment-2041348498): >/close >as the root cause of the flaking is totally different from the issue description now. > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.