Open ctalledo opened 2 years ago
Here is a sample Packer script for building an AMI with pre-installed Sysbox based on an Ubuntu EKS AMI, courtesy of @maximsmol.
@ctalledo I try to use the ami from @maximsmol but he just doesn’t work, not sure this is using containerd as well. The aws cni just failing everytime.
Hi @shinji62,
The aws cni just failing everytime.
How does it fail? (e.g., what happens when you kubectl describe
the failing pod).
I've not tried the AMI from @maximsmol myself, but I believe he is actively using it.
So some part are working for example kube-proxy
is working properly, but aws-nodes is failing with
Events: │
│ Type Reason Age From Message │
│ ---- ------ ---- ---- ------- │
│ Warning BackOff 6m12s (x2954 over 16h) kubelet, ip-10-10-10-93.ap-northeast-1.compute.internal Back-off restarting failed container │
│ Warning Unhealthy 75s (x3279 over 16h) kubelet, ip-10-10-10-93.ap-northeast-1.compute.internal (combined from similar events): Readiness probe failed: {"level":"info","ts":"2022-05-24T00:49:21.513Z","caller":"/usr/local │
│ /go/src/runtime/proc.go:203","msg":"timeout: failed to connect service \":50051\" within 1s"}
Logs from the pods
│ aws-vpc-cni-init + '[' false == true ']' │
│ aws-vpc-cni-init + sysctl -e -w net.ipv4.tcp_early_demux=1 │
│ aws-vpc-cni-init net.ipv4.tcp_early_demux = 1 │
│ aws-vpc-cni-init + echo 'CNI init container done' │
│ aws-vpc-cni-init CNI init container done │
│ aws-vpc-cni-init stream closed │
│ aws-node {"level":"info","ts":"2022-05-24T00:50:02.144Z","caller":"entrypoint.sh","msg":"Validating env variables ..."} │
│ aws-node {"level":"info","ts":"2022-05-24T00:50:02.146Z","caller":"entrypoint.sh","msg":"Install CNI binary.."} │
│ aws-node {"level":"info","ts":"2022-05-24T00:50:02.192Z","caller":"entrypoint.sh","msg":"Starting IPAM daemon in the background ... "} │
│ aws-node {"level":"info","ts":"2022-05-24T00:50:02.194Z","caller":"entrypoint.sh","msg":"Checking for IPAM connectivity ... "}
One things I found is that docker daemon is not working on the node where this is failing.
To be honest I don't really understand the change here https://github.com/latchbio/sysbox-eks-ami/blob/master/sysbox-eks.pkr.hcl#L230 that's may be the cause of the issue.
Try to run the support script for AWS
Trying to collect common operating system logs...
Trying to collect kernel logs...
Trying to collect mount points and volume information...
Trying to collect SELinux status...
Trying to collect iptables information...
Trying to collect installed packages...
Trying to collect active system services...
Trying to Collect Containerd daemon information... Timed out, ignoring "containerd info output "
Trying to collect Docker daemon information...
Warning: The Docker daemon is not running.
Trying to collect kubelet information... error: write /dev/stdout: permission denied
Trying to collect L-IPAMD introspection information... Trying to collect L-IPAMD prometheus metrics... Trying to collect L-IPAMD checkpoint... cp: cannot stat '/var/run/aws-node/ipam.json': No such file or directory
Trying to collect sysctls information...
Trying to collect networking infomation... conntrack v1.4.5 (conntrack-tools): 193 flow entries have been shown.
timeout: failed to run command 'ifconfig': No such file or directory
Trying to collect CNI configuration information...
Trying to collect Docker daemon logs...
Trying to archive gathered information...
My guess is the sysbox installer is doing way more things that in AMI provided by @maximsmol, so I guess I will have to wait that @ctalledo or @rodnymolina as providing an image that just works.
Several Sysbox users have asked for AWS AMIs that include Ubuntu + Shiftfs + Sysbox. This makes it easier for them to create AWS EC2 VMs that have Sysbox in them.
This Sysbox discussion thread provides information on how to do it. Nestybox should look into creating such AMI(s) for Sysbox users.