spurin / diveintoansible-lab

Dive Into Ansible Lab
788 stars 498 forks source link

Ubuntu containers exit with code 255 #60

Closed yangzhuronghuang closed 2 years ago

yangzhuronghuang commented 2 years ago

I was using Amazon Linux 2 to run the ansible lab. I got below errors:

Creating centos2 ... done Creating docker ... done Creating ubuntu3 ... done Creating ubuntu2 ... done Creating centos1 ... done Creating ubuntu-c ... done Creating centos3 ... done Creating ubuntu1 ... done Creating portal ... done Attaching to ubuntu3, centos2, docker, ubuntu2, ubuntu-c, ubuntu1, centos1, centos3, portal ubuntu3 exited with code 255 ubuntu2 exited with code 255 ubuntu-c exited with code 255 ubuntu1 exited with code 255 portal | 2021/09/03 22:18:51 [emerg] 1#1: host not found in upstream "ubuntu1" in /etc/nginx/conf.d/default.conf:46 portal | nginx: [emerg] host not found in upstream "ubuntu1" in /etc/nginx/conf.d/default.conf:46 portal exited with code 1

Below is my environment: uname -r 4.14.165-103.209.amzn1.x86_64

Thanks in advance for your help!

spurin commented 2 years ago

Hi @yangzhuronghuang

Have a look at the main Readme on this repository and search for 255.

You'll see that the issue here references Ubuntu but I'm confident that it will be the same for running AWS Linux under WSL 2 (as it relates more to WSL 2 than the specific Linux version).

Please let me know if this resolves the issue and I'll update the wording accordingly.

Best Regards

James Spurin

yangzhuronghuang commented 2 years ago

Hi @spurin

I tried the fix, but it returned 'mkdir: cannot create directory"/sys/fs/cgroup/systemd": no such file or directory'

BTW, I search on Google and find that WSL stands for Windows Subsystem for Linux, and since I am using Amazon Linux2, which is based on RHEL, why is it related windows subsystem?

Thanks!

spurin commented 2 years ago

Hi @yangzhuronghuang

Sorry, I thought you were using AWS Linux under WSL which seems to be an option also. This is where I've seen the 255 error before.

Could you elaborate on the setup?

Thanks

James

yangzhuronghuang commented 2 years ago

Hi @spurin

docker --version Docker version 18.09.9-ce, build 039a7df docker-compose --version docker-compose version 1.22.0, build f46880fe Docker compose yaml file version is 3.5

What other setup do you need?

spurin commented 2 years ago

Hi @yangzhuronghuang

Are you running this in AWS on EC2?

yangzhuronghuang commented 2 years ago

Hi @spurin Yes

spurin commented 2 years ago

Okay,

Lets cover a few different things at the same time. Can you please provide the contents of your .env file.

Will you share the output of the mount command, just run as it is.

Lastly, anything further you can share on how you set this up, the specifics on AWS so I can recreate this myself, if needed.

Thanks

James

yangzhuronghuang commented 2 years ago

Hi @spurin .env is like this:

sshd ports

UBUNTUC_PORT_SSHD=2221 UBUNTU1_PORT_SSHD=2222 UBUNTU2_PORT_SSHD=2223 UBUNTU3_PORT_SSHD=2224 CENTOS1_PORT_SSHD=2225 CENTOS2_PORT_SSHD=2226 CENTOS3_PORT_SSHD=2227

ttyd (web terminal) ports

UBUNTUC_PORT_TTYD=7681 UBUNTU1_PORT_TTYD=7682 UBUNTU2_PORT_TTYD=7683 UBUNTU3_PORT_TTYD=7684 CENTOS1_PORT_TTYD=7685 CENTOS2_PORT_TTYD=7686 CENTOS3_PORT_TTYD=7687

Shared config volume

CONFIG=/home/ec2-user/diveintoansible-lab/config

Shared home directories

ANSIBLE_HOME=/home/ec2-user/diveintoansible-lab/ansible_home

mount output is like this: [ec2-user@ip-172-31-90-92 diveintoansible-lab]$ mount proc on /proc type proc (rw,relatime) sysfs on /sys type sysfs (rw,relatime) devtmpfs on /dev type devtmpfs (rw,relatime,size=493944k,nr_inodes=123486,mode=755) devpts on /dev/pts type devpts (rw,relatime,gid=5,mode=620,ptmxmode=000) tmpfs on /dev/shm type tmpfs (rw,relatime) /dev/xvda1 on / type ext4 (rw,noatime,data=ordered) devpts on /dev/pts type devpts (rw,relatime,gid=5,mode=620,ptmxmode=000) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime) cgroup on /cgroup/blkio type cgroup (rw,relatime,blkio) cgroup on /cgroup/cpu type cgroup (rw,relatime,cpu) cgroup on /cgroup/cpuacct type cgroup (rw,relatime,cpuacct) cgroup on /cgroup/cpuset type cgroup (rw,relatime,cpuset) cgroup on /cgroup/devices type cgroup (rw,relatime,devices) cgroup on /cgroup/freezer type cgroup (rw,relatime,freezer) cgroup on /cgroup/hugetlb type cgroup (rw,relatime,hugetlb) cgroup on /cgroup/memory type cgroup (rw,relatime,memory) cgroup on /cgroup/perf_event type cgroup (rw,relatime,perf_event) /dev/xvda1 on /var/lib/docker type ext4 (rw,noatime,data=ordered)

AMI id I use: ami-00043ff468d078003 After launching an EC2 instance, I:

  1. install git: sudo yum install git
  2. clone the code from repo
  3. install docker-composer 1.22: sudo curl -L https://github.com/docker/compose/releases/download/1.22.0/docker-compose-$(uname -s)-$(uname -m) -o /usr/local/bin/docker-compose
  4. sudo chmod +x /usr/local/bin/docker-compose
  5. add ec2-user to docker group: sudo usermod -aG docker $USER
  6. change version of docker compose yaml file version to 3.5
spurin commented 2 years ago

Hi @yangzhuronghuang

I thought that this may have been relating to cgroups but this looks fine like everything else with your configuration.

I'm in the UK and it's late here but tomorrow, I'll try and fire up the same AMI image, troubleshoot and then come back to you with some answers 👍

Best Regards

James Spurin

yangzhuronghuang commented 2 years ago

Hi @spurin No problem. Thanks in advance!

spurin commented 2 years ago

Hi @yangzhuronghuang

I spun up the instance today, using the ami you mentioned -

'aws-elasticbeanstalk-amzn-2018.03.20.x86_64-docker-hvm-202002250058 - ami-00043ff468d078003'

After exploring this I noticed that it doesn't have systemd and seems to be a specific image that you'd probably use for use with AWS ElasticBeanStalk. I can see why you used it though as it shows up, when searching for docker in the AMI search.

As the containers use systemd we need a base that uses systemd which, you'd typically find on most modern Linux systems, even AWS Linux 2 (but not, this particular image, maybe something to do with it being in use for ElasticBeanStalk).

Therefore, instead, I setup a new system with -

'Amazon Linux 2 AMI (HVM), SSD Volume Type - ami-087c17d1fe0178315 (64-bit x86) / ami-029c64b3c205e6cce (64-bit Arm)'

The lab environment, should work with both Intel and Arm and just for testing purposes, I picked the Arm image as these are quite reasonably priced on AWS at the moment. Here's the commands I ran (as the ec2-user) to get this working -

Install git

sudo yum install git

Install and configure docker

sudo yum install docker
sudo service docker start
sudo usermod -a -G docker ec2-user

At this point, I ran exit and re-logged in

exit

Intel Only - Next we install docker-compose (most recent version), if you're using Intel you can just run -

sudo curl -L https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m) -o /usr/local/bin/docker-compose
sudo chmod 644 /usr/local/bin/docker-compose

However, if you're using ARM or wish to build docker-compose from source on Intel, it's the following -

sudo yum groupinstall "Development Tools"
sudo yum install python3-devel
sudo python3 -m pip install -IU wheel
sudo python3 -m pip install -IU docker-compose

Clone the repo and move to the directory

git clone https://github.com/spurin/diveintoansible-lab.git
cd diveintoansible-lab/

Configure the .env file, set the following lines -

CONFIG=/home/ec2-user/diveintoansible-lab/config
ANSIBLE_HOME=/home/ec2-user/diveintoansible-lab/ansible_home

Edit the docker-compose.yaml file, and put an override for localhost with the ip of your instance using the LOCALHOST_OVERRIDE parameter, i.e.

  portal:
    hostname: portal
    container_name: portal
    image: spurin/diveintoansible:portal
    environment:
     - LOCALHOST_OVERRIDE=18.207.160.112
     - NGINX_ENTRYPOINT_QUIET_LOGS=1
    depends_on:
     - centos1
     - centos2
     - centos3
     - ubuntu1
     - ubuntu2
     - ubuntu3
    ports:
     - "1000:80"
    networks:
     - diveinto.io

Start the lab

docker-compose up

With this done, you should now see it stating attaching without any errors -

Attaching to ubuntu-c, ubuntu2, centos2, centos3, docker, centos1, ubuntu3, ubuntu1, portal

As this is running in the cloud, you'll need to configure your security group so that your local ip address has access to the remote ip address, for the ports mentioned. An easy way to do this is to edit the security group and add the rule 'Type: All TCP' and 'Source: My IP'. Save the rule and then you should be able to browse to your ip with port 1000, i.e. in my case this was - http://18.207.160.112:1000/

Hope this helps and please let me know how you get on.

Thanks

James Spurin

yangzhuronghuang commented 2 years ago

Hi @spurin

Thanks for the detailed process! I will try it and give you feedback some time today.

yangzhuronghuang commented 2 years ago

Hi @spurin I tested and it worked. Thanks a lot for your time and effort!