Closed erkannt closed 9 months ago
Error message of failed main.yml
GH Action run indicates that our EC2 node has run out of space. As I can't access the node via SSH I will attemp to recreate the node using the ijm-infra
repo. Previously this caused an IP to change which we needed to then specify via GH Action secrets so that the deploy pipeline of this repo could succeed (see #160).
The culprit of the space issue is the overlay2
folder of docker (/var/lib/docker/overlay2
).
Investigating root cause:
Identify which folders are consuming space in overlay2:
du -s /var/lib/docker/overlay2/*/diff |sort -n -r # identify critical folder(s)
Find the correspondig docker container:
docker inspect $(docker container ls -q) | grep PART-OF-OFFENDING-FOLDER-NAME -B 300 -A 300 | less
Find out which folders where added or changed since container creation:
docker diff ijm-prod_journal_1 | grep '^A\|^C' | cut -f 2 -d " " | sort
Looking at the output of docker exec -it ijm-prod_journal_1 sh -c 'du -sh /app/var/*'
we should probably keep an eye on the size of the cache. Given the fact that I found a 2GB log file floating around on the server that looks like it was created by the application the logs folder also needs watching.
109M /app/var/cache
576K /app/var/logs
After a deploy the fresh container's var
dir is a decent chunk smaller:
25M /app/var/cache
36K /app/var/logs
Closing as we have mitigated the issue and now have ways to resolve this more quickly in the future:
docker system prune -a
or recreate journal using deploy.sh
in /home/ec2-user/ijm-prod
Currently the node disk is at 50% with 8.6G free space. Assets are currently 2.5G, the containers and images in a clean state seem to consume 3.7G.
/cc @BlueReZZ @pbronka
Thank you! @erkannt and @BlueReZZ for looking into this and fixing the problem, we're all very happy that the website works well again
Hi @erkannt , I'm going to re-open this issue because I tried making a very small change fixing a typo in an article and the CI tests have failed. Do you have any thoughts on what might be going on here https://github.com/microsimulation/ijm/actions/runs/7151707128/job/19476285635 ?
The pipeline is failing due to an unrelated issue. There is a failing feature test. I have created a new ticket (#204) in favour of polluting this one.
Refs: #200