nextcloud / ansible-collection-nextcloud-admin

The ansible galaxy for your nextcloud administrative needs.
https://galaxy.ansible.com/nextcloud/admin
BSD 2-Clause "Simplified" License
138 stars 77 forks source link

Fix GitubActions Docker issue #318

Closed wiktor2200 closed 1 month ago

wiktor2200 commented 11 months ago

Hi! @staticdev @aalaesar I've tried to fix a problem with docker molecule test but I got run out of idea. Would you be able to take a look and see? Maybe you will have some other solutions for this problem?

@geerlingguy Sorry for bothering you, but maybe you have got any idea what could have gone wrong here? Have you ever seen such error in yours Ansible images? https://github.com/nextcloud/ansible-collection-nextcloud-admin/actions/runs/6836211015/job/18590869076?pr=318#step:7:110

  failed: [localhost] (item={'failed': 0, 'started': 1, 'finished': 0, 'ansible_job_id': 'j801881025068.2108', 'results_file': '/home/runner/.ansible_async/j801881025068.2108', 'changed': True, 'item': {'cgroupns_mode': 'host', 'command': '', 'image': 'docker.io/geerlingguy/docker-debian12-ansible:latest', 'name': 'instance', 'pre_build_image': True, 'privileged': True, 'volumes': ['/sys/fs/cgroup:/sys/fs/cgroup:rw']}, 'ansible_loop_var': 'item'}) => {"ansible_job_id": "j801881025068.2108", "ansible_loop_var": "item", "attempts": 8, "changed": false, "finished": 1, "item": {"ansible_job_id": "j801881025068.2108", "ansible_loop_var": "item", "changed": true, "failed": 0, "finished": 0, "item": {"cgroupns_mode": "host", "command": "", "image": "docker.io/geerlingguy/docker-debian12-ansible:latest", "name": "instance", "pre_build_image": true, "privileged": true, "volumes": ["/sys/fs/cgroup:/sys/fs/cgroup:rw"]}, "results_file": "/home/runner/.ansible_async/j801881025068.2108", "started": 1}, "msg": "Error creating container: 500 Server Error for http+docker://localhost/v1.43/containers/create?name=instance: Internal Server Error (\"symlink /proc/mounts /var/lib/docker/fuse-overlayfs/4441cd54c476cdd29d6f1ded1e93781e3c3929ca7407bbc645bd90b92c4c22e2-init/merged/etc/mtab: file exists\")", "results_file": "/home/runner/.ansible_async/j801881025068.2108", "started": 1, "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
aalaesar commented 11 months ago

Hello @wiktor2200 thank you for taking some time to fix the CI. I've also been trying to fix it on some other branch but with no success. :disappointed: Most of the time I suppose the issue is in our code as the ansible image used is popular and I couldn't find someone with a similar issue. Let see if Jeff Geerling can help us :wink:

Edit: just thought. maybe we are upgrading ansible toot fast with dependabot for us to follow ansibles/molecule changes.

Regards

geerlingguy commented 11 months ago

It looks like the error is:

Error creating container: 500 Server Error for http+docker://localhost/v1.43/containers/create?name=instance: Internal Server Error (\"symlink /proc/mounts /var/lib/docker/fuse-overlayfs/4441cd54c476cdd29d6f1ded1e93781e3c3929ca7407bbc645bd90b92c4c22e2-init/merged/etc/mtab: file exists\")

I've seen similar file mount issues in GitHub Actions sometimes, but haven't in the past few months. Is this only with debian12?

wiktor2200 commented 11 months ago

Hello Jeff! thanks a lot for involvement, I really appreciate that :)

When we were searching for this issue, there is not many issues found, that's why I asked. It occurs randomly in all of our Molecule tests scenarios (both Debian 11,12 and Ubuntu20.04, 22.04), we define scenarios this way: https://github.com/nextcloud/ansible-collection-nextcloud-admin/blob/20ab659c9d5eeaef1d091d4571bb17623e47edb8/.github/workflows/tests.yml#L20-L23

Then running it with: https://github.com/nextcloud/ansible-collection-nextcloud-admin/blob/20ab659c9d5eeaef1d091d4571bb17623e47edb8/.github/workflows/tests.yml#L51

Molecule itself it defined here: https://github.com/nextcloud/ansible-collection-nextcloud-admin/blob/20ab659c9d5eeaef1d091d4571bb17623e47edb8/molecule/default/molecule.yml#L7-L15

And as it's matrix when once fails, rest are cancelled. In this PR I've tried to clean docker cache (inspired with your old blog post: https://www.jeffgeerling.com/blog/2018/testing-your-ansible-roles-molecule) and then molecule reset when docker system prune didn't help.

aalaesar commented 11 months ago

Hello there @wiktor2200 found this subject on Linux containers forum that is looking much like our issue. Is there a way to check if our github actions are running on top of of LXD?

staticdev commented 11 months ago

hello all, I have been super busy with some other ansible issues, construction (like @geerlingguy =p) and don't really understand why this issue is happening. most my roles are tested against the same images and I don't have such an error. I would say to try using podman instead of docker since I mostly replaced docker for podman now. it is an alternative solution

staticdev commented 10 months ago

@wiktor2200 Thanks for trying it out. I saw some potential issues with current state of the PR on comments.

aalaesar commented 8 months ago

Hello there. I noticed that we are not running on this issue anymore now..... somehow the issue disapeared... I'll keep the Pr in draft for now until we are confident the issue is gone for good. Regards

aalaesar commented 1 month ago

initial issue is gone now and CI has been fixed to work now. closing