Closed juhis135 closed 2 years ago
Kubernetes version: Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.0", GitCommit:"ab69524f795c42094a6630298ff53f3c3ebab7f4", GitTreeState:"clean", BuildDate:"2021-12-07T18:16:20Z", GoVersion:"go1.17.3", Compiler:"gc", Platform:"windows/amd64"} Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.2-eks-06eac09", GitCommit:"5f6d83fe4cb7febb5f4f4e39b3b2b64ebbbe3e97", GitTreeState:"clean", BuildDate:"2021-09-13T14:20:15Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"} WARNING: version difference between client (1.23) and server (1.21) exceeds the supported minor version skew of +/-1
That sounds like a cluster issue in general; are you able to launch any pods on windows nodes via any means?
The error message doesn't have any language/keywords that indicates it is something about how Sonobuoy is launching things. Just that the network may be misconfigured on the cluster for the Windows nodes.
@jayunit100 have you seen errors like that before for Windows EKS cluster?
I am able to deploy other pods successfully on the windows node. I was able to deploy the sample application provided by AWS (https://docs.aws.amazon.com/eks/latest/userguide/sample-deployment.html ) as well on the windows nodes.
Its only the issue with the pods that the sonobouy creates on windows node while running the plugins.
Also, The sonobouy aggregator pods runs successfully on the windows nodes only when I pass the argument --node-aggregator-selector=beta.kubernetes.io/os=windows Without this flag, even the sonobouy pod throws the same error.
Also, I am using a mixed EKS cluster, with both Linux and windows nodes.
OK, so presumably the aggregator has a problem launching on the linux nodes (by specifying the aggregator selector you said it works fine, right?).
I'm not terribly familiar with EKS error messages but I can attempt to repro at some point.
It does just seem strange since
agnhost-primary-w2bd5
or agnhost-primary-w2bd5_kubectl-4540
vpc.amazonaws.com/PrivateIPv4Address
My guess is that you dont have any linux worker nodes, only control plane nodes; is that right? Sonobuoy tolerates the normal master node taints but EKS may otherwise prevent you from launching pods there. Whatever this process is seems to keep Sonobuoy from launching and hitting these pod/labels in the error messages.
If thats the case and I can repro, then this should be put into the known issues or FAQ; that even though Sonobuoy can run on either node type, you need to ensure that the Linux nodes are schedulable or you have to provide that aggregator-node-selector
flag as you demonstrated.
If its not too much trouble and you happen to have a tarball from a Sonobuoy run, it would help in understanding the situation since it contains logs and API object information.
Thanks.
We have 2 Linux nodes and 2 windows nodes in the EKS clusters. Even when the Sonobuoy pod is created on the linux node , the other pods that it creates for windows nodes, they end up throwing the same error.
I am a bit skeptical about sharing the whole tar file for the reason that it might contain some project info. Is there any specific file from the tar that you are looking for. I can verify and provide that specific file which you are looking for.
A few that come to mind would be:
meta/*
would contain basic info about the sonobuoy definitions, plugins, etc. If you dont want to share all of that then meta/run.log
that is the aggregator logs which would be useful.plugins/*
would be useful since that contains the plugin specifications and results/errorspodlogs/sonobuoy/*
That would contain any pod logs for pods that were created in that namespaceresources/ns/sonobuoy/*
would contain things like pod info (API info), events, etc that may be relevant for their scheduling.Hi,
It would not be possible to share the files due to project restrictions.
Will it be possible for you to reproduce this issue on your end using EKS 1.21 for windows nodes.
One of the other teams is also getting the same issue for windows node in EKS.
I'm happy to try and repro. How are you able to get EKS with windows nodes? When I'm adding a node group I only have Linux and Bottlerocket choices
update: Sorry; following https://docs.aws.amazon.com/eks/latest/userguide/windows-support.html as I see it is an opt-in feature. I'll try and follow this and see how it goes.
So:
At first, I found I left off this line from the instructions:
eksctl utils install-vpc-controllers --cluster my-cluster --approve
But then I deleted and recreated my windows node group and tried again. Same issue though.
However, I'll try and help resolve in one other way. These instructions were via the legacy windows support method. There should be another method that may not hit the same vpc issue. I'll let you know.
Confirmed that following the instructions for windows support on EKS (not legacy windows support) worked. The IAM role tagged the pod with the ipv4 address as (apparently) expected:
vpc.amazonaws.com/PrivateIPv4Address: 192.168.90.24/19
Thanks for reproducing the issue.
Just wanted to add, if you are using Kubernetes version > 1.17, you would need to follow the steps mentioned in this section "Enabling Windows support" of the AWS document "https://docs.aws.amazon.com/eks/latest/userguide/windows-support.html"
The instruction you mentioned above (eksctl utils install-vpc-controllers --cluster my-cluster --approve) is required only for kubernetes version older than 1.17.
But we are getting the same errors, following either set of instructions.
Also, I was able to run the pods deploying the sample application mentioned in the AWS document on the windows node. https://docs.aws.amazon.com/eks/latest/userguide/sample-deployment.html
It would be a great help if you could find something to resolve this issue.
Hi all. The main issue here is the way how the EKS networking works, especially the networking of the windows workers. The related issue can be found here https://github.com/aws/containers-roadmap/issues/463 In order to resolve this issue, we need to have the following node selectors for each pod that should be scheduled on windows workers (each test case):
nodeSelector:
kubernetes.io/os: windows
kubernetes.io/arch: amd64
By having the above node selectors, the VPC CNI controller should be able to assign the IPs properly so the pods can run.
@johnSchnake - Can we please try the solution suggested by Danijel in the previous comment and update the test cases with the relevant node selectors for windows nodes.
@johnSchnake Hi, I'm experiencing the same issue here. Is the proposed solution that adding a nodeSelector section to each test pod going to be carried out by Sonobuoy side? Or is there any quick fix I can do by myself to run sonobuoy on an EKS cluster with Windows nodes? Currently, the aggregator can run on the mix-node cluster in EKS but all the tests failed due to the above error "pod does not have label vpc.amazonaws.com/PrivateIPv4Address".
@jiechen0826 just filed an upstream issue here https://github.com/kubernetes/kubernetes/issues/119022
I tried running the below commands:
sonobuoy run --plugin-env=e2e.E2E_EXTRA_ARGS='--progress-report-url=http://localhost:8099/progress --node-os-distro=windows' --plugin=win-e2e-image-repo-list-master.yml --security-context-mode=none --aggregator-node-selector="beta.kubernetes.io/os:windows"
sonobuoy run --plugin 'win-e2e-image-repo-list-master.yml' --security-context-mode=none --wait --aggregator-node-selector "beta.kubernetes.io/os:windows"
Error I am getting : Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "4054eb01f3637b53bbac466e6329fee6cb30cd9aff849de7631edb12fea5dccf" network for pod "agnhost-primary-w2bd5": networkPlugin cni failed to set up pod "agnhost-primary-w2bd5_kubectl-4540" network: failed to parse Kubernetes args: pod does not have label vpc.amazonaws.com/PrivateIPv4Address
This is coming for all the pods created by Sonobuoy for running the plugins.
What did you expect to happen: The plugins should run on Windows nodegroup
Anything else you would like to add: I am using EKS1.21 with linux and windows nodes
Environment:
kubectl version
): Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.0", GitCommit:"ab69524f795c42094a6630298ff53f3c3ebab7f4", GitTreeState:"clean", BuildDate:"2021-12-07T18:16:20Z", GoVersion:"go1.17.3", Compiler:"gc", Platform:"windows/amd64"}