nestybox / sysbox

An open-source, next-generation "runc" that empowers rootless containers to run workloads such as Systemd, Docker, Kubernetes, just like VMs.
Apache License 2.0
2.65k stars 147 forks source link

Create AWS AMI(s) with Ubuntu + Shiftfs + Sysbox #538

Open ctalledo opened 2 years ago

ctalledo commented 2 years ago

Several Sysbox users have asked for AWS AMIs that include Ubuntu + Shiftfs + Sysbox. This makes it easier for them to create AWS EC2 VMs that have Sysbox in them.

This Sysbox discussion thread provides information on how to do it. Nestybox should look into creating such AMI(s) for Sysbox users.

ctalledo commented 2 years ago

Here is a sample Packer script for building an AMI with pre-installed Sysbox based on an Ubuntu EKS AMI, courtesy of @maximsmol.

https://github.com/latchbio/sysbox-eks-ami

shinji62 commented 2 years ago

@ctalledo I try to use the ami from @maximsmol but he just doesn’t work, not sure this is using containerd as well. The aws cni just failing everytime.

ctalledo commented 2 years ago

Hi @shinji62,

The aws cni just failing everytime.

How does it fail? (e.g., what happens when you kubectl describe the failing pod).

I've not tried the AMI from @maximsmol myself, but I believe he is actively using it.

shinji62 commented 2 years ago

So some part are working for example kube-proxy is working properly, but aws-nodes is failing with

Events:                                                                                                                                                                                                                             │
│   Type     Reason     Age                     From                                                     Message                                                                                                                      │
│   ----     ------     ----                    ----                                                     -------                                                                                                                      │
│   Warning  BackOff    6m12s (x2954 over 16h)  kubelet, ip-10-10-10-93.ap-northeast-1.compute.internal  Back-off restarting failed container                                                                                         │
│   Warning  Unhealthy  75s (x3279 over 16h)    kubelet, ip-10-10-10-93.ap-northeast-1.compute.internal  (combined from similar events): Readiness probe failed: {"level":"info","ts":"2022-05-24T00:49:21.513Z","caller":"/usr/local │
│ /go/src/runtime/proc.go:203","msg":"timeout: failed to connect service \":50051\" within 1s"}

Logs from the pods

│ aws-vpc-cni-init + '[' false == true ']'                                                                                                                                                                                            │
│ aws-vpc-cni-init + sysctl -e -w net.ipv4.tcp_early_demux=1                                                                                                                                                                          │
│ aws-vpc-cni-init net.ipv4.tcp_early_demux = 1                                                                                                                                                                                       │
│ aws-vpc-cni-init + echo 'CNI init container done'                                                                                                                                                                                   │
│ aws-vpc-cni-init CNI init container done                                                                                                                                                                                            │
│ aws-vpc-cni-init stream closed                                                                                                                                                                                                      │
│ aws-node {"level":"info","ts":"2022-05-24T00:50:02.144Z","caller":"entrypoint.sh","msg":"Validating env variables ..."}                                                                                                             │
│ aws-node {"level":"info","ts":"2022-05-24T00:50:02.146Z","caller":"entrypoint.sh","msg":"Install CNI binary.."}                                                                                                                     │
│ aws-node {"level":"info","ts":"2022-05-24T00:50:02.192Z","caller":"entrypoint.sh","msg":"Starting IPAM daemon in the background ... "}                                                                                              │
│ aws-node {"level":"info","ts":"2022-05-24T00:50:02.194Z","caller":"entrypoint.sh","msg":"Checking for IPAM connectivity ... "}

One things I found is that docker daemon is not working on the node where this is failing.

To be honest I don't really understand the change here https://github.com/latchbio/sysbox-eks-ami/blob/master/sysbox-eks.pkr.hcl#L230 that's may be the cause of the issue.

Try to run the support script for AWS

Trying to collect common operating system logs...
Trying to collect kernel logs...
Trying to collect mount points and volume information...
Trying to collect SELinux status...
Trying to collect iptables information...
Trying to collect installed packages...
Trying to collect active system services...
Trying to Collect Containerd daemon information...      Timed out, ignoring "containerd info output "

Trying to collect Docker daemon information...

        Warning: The Docker daemon is not running.

Trying to collect kubelet information... error: write /dev/stdout: permission denied

Trying to collect L-IPAMD introspection information... Trying to collect L-IPAMD prometheus metrics... Trying to collect L-IPAMD checkpoint... cp: cannot stat '/var/run/aws-node/ipam.json': No such file or directory

Trying to collect sysctls information...
Trying to collect networking infomation... conntrack v1.4.5 (conntrack-tools): 193 flow entries have been shown.
timeout: failed to run command 'ifconfig': No such file or directory

Trying to collect CNI configuration information...
Trying to collect Docker daemon logs...
Trying to archive gathered information...
shinji62 commented 2 years ago

My guess is the sysbox installer is doing way more things that in AMI provided by @maximsmol, so I guess I will have to wait that @ctalledo or @rodnymolina as providing an image that just works.