linux-system-roles / tox-lsr

tox plugin for testing linux system roles locally via tox
MIT License
4 stars 12 forks source link

Run CI in a CentOS 9 container using runcontainer.sh #137

Open nhosoi opened 1 year ago

nhosoi commented 1 year ago

I'm attempting to run CI tests using tox-lsr runcontainer.sh.

Here is the sample command line:

tox -e container-ansible-core-2.15 -- --erase-old-snapshot --parallel 1 ¥
  --image-name centos-9 tests/tests_*.yml

To make the command successfully complete I had to make changes to runcontainer.sh as well as many of the roles. The following text describes my work. In the attachment, first, I’m proposing to make some fixes on runcontainer.sh in tox-lsr. In the next summary section, I classified the roles based on the requirements to pass the tests.

runcontainer.sh

https://github.com/nhosoi/tox-lsr/tree/runcontainer

Modify runcontainer.sh

- Modify EXTRA_RPM to be installed as COMMON_PKGS.
  It is necessary to allow extra rpm packages to download
  from HA repositories.
- Modify EXTRA_SKIP_TAGS.
  *  Always skip tests tagged with tests::reboot
  *  Merge EXTRA_SKIP_TAGS into CONTAINER_SKIP_TAGS and
     pass the latter to ansible-playbook.
- Add tests/collection-requirements.yml to the loop to install the
  test requirements using `ansible-galaxy collection`.
- Add LSR_TOX_ENV_TMP_DIR to ANSIBLE_COLLECTIONS_PATHS if some
  collections are installed in the path.
- Add `--force` to `ansible-galaxy collection install` to make sure
  the collection is installed.

SUMMARY

1) Tests that work with no changes

2) Tests that work by adding additional packages to tests/setup-snapshot.yml

3) Tests that work by fixing a bug

4) Tests that work by adding a missing file

5) Tests that work by skipping tests with skip tags

6) Tests that work by adding additional packages to tests/setup-snapshot.yml and skipping tests with skip tags

7) Tests that work by making a fix in a test

8) Tests that work by skipping tests with skip tags and making a fix in a test

9) Tests that work by adding additional packages to tests/setup-snapshot.yml, and skipping tests with skip tags

10) Tests that work by adding additional packages to tests/setup-snapshot.yml, skipping tests with skip tags, and making a fix in a test

- network
    - skip_tag: tests::network_scripts
    - skip_tag: tests::container_bond_check
    - skip_tag: tests::container_op_not_permitted
    - skip_tag: tests::container_not_supported
    - fix: Skip installing network-scripts on CentOS 9
    - url: https://github.com/nhosoi/network/tree/runcontainer

11) Tox config file (either pyproject.toml, tox.ini, setup.cfg) not found

- image_builder
- sshd
- tuned

12) Architecture is limited to x86_64(?)

- mssql
    - There is no way to run the tests using non-x86_64 architecture?

13) Kernel modules are missing in the container image

- nbde_client
    - kernel_module: dm_mod

14) All tests relies on selinux

- postfix
- selinux
richm commented 1 year ago

I don't think we should be using --extra-rpms

Similarly with roles missing vars/CentOS_9.yml - what is the failure, and why don't we see errors in qemu or baseosci tests?

re: certificate tests_basic_ipa.yml - https://github.com/linux-system-roles/tox-lsr/blob/main/src/tox_lsr/test_scripts/runcontainer.sh#L5 - I believe the test will work if you specify CONTAINER_HOSTNAME=something - what is the failure you see? This may also be related to the failures you are seeing in ha_cluster - it may be that the default hostname is too long and causing the quorum errors.

nhosoi commented 1 year ago

You should not need to specify --extra-rpms. That should be handled in each role vars/main.yml for runtime rpms, and tests/setup-snapshot.yml for testing rpms - I'm not sure why the baseosci and qemu tests work without extra-rpms, but container testing needs them? That doesn't make sense to me, unless aarch64 has different package requirements?

Thank you, @richm. It makes sense. I have updated tests/setup-snapshot.yml in firewall, ha_cluster, logging, network, and pam_pwd, and adjusted the Description

nhosoi commented 1 year ago

Similarly with roles missing vars/CentOS_9.yml - what is the failure, and why don't we see errors in qemu or baseosci tests?

This is an example failure from ad_integration.

TASK [Install test packages] ***************************************************
task path: /home/nhosoi/linux-system-roles/ad_integration/tests/setup-snapshot.y
ml:11
Friday 08 September 2023  11:36:54 -0700 (0:00:00.020)       0:00:00.959 ****** 
fatal: [sut]: FAILED! => {}
MSG:
The task includes an option with an undefined variable. The error was: '__templa
te_packages' is undefined. '__template_packages' is undefined
The error appears to be in '/home/nhosoi/linux-system-roles/ad_integration/tests
/setup-snapshot.yml': line 11, column 7, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
    - name: Install test packages
      ^ here

__template_packages referred in tests/setup-snapshot.yml is defined in the following files (main branch). There is no CentOS_9.yml. I wonder setup-snapshot.yml is no longer used in CentOS_9.yml (nor RedHat_9.yml that is not necessary any more?)?

vars/CentOS_6.yml:__template_packages: []
vars/CentOS_7.yml:__template_packages: []
vars/CentOS_8.yml:__template_packages: []
vars/Fedora.yml:__template_packages: []
vars/RedHat_6.yml:__template_packages: []
vars/RedHat_7.yml:__template_packages: []
vars/RedHat_8.yml:__template_packages: []

And of course, since __templatepackages is not currently used, we could remove `CentOS.ymlandRedHat_.yml, then remove theInstall test packagestask fromtests/setup-snapshot.yml`, which solves the problem... Rather, do we want to do so?

richm commented 1 year ago

I see - there are a couple of bugs in the roles

nhosoi commented 1 year ago

Another missing CentOS 9 file case: ssh The role has CentOS_* symlinks as follows:

$ ls -l vars/CentOS_* vars/RedHat_*
lrwxrwxrwx. 1 nhosoi nhosoi  12 May 15 11:03 vars/CentOS_6.yml -> RedHat_6.yml
lrwxrwxrwx. 1 nhosoi nhosoi  12 May 15 11:03 vars/CentOS_7.yml -> RedHat_7.yml
lrwxrwxrwx. 1 nhosoi nhosoi  12 May 15 11:03 vars/CentOS_8.yml -> RedHat_8.yml
-rw-r--r--. 1 nhosoi nhosoi 351 May 15 11:03 vars/RedHat_6.yml
-rw-r--r--. 1 nhosoi nhosoi 320 May 15 11:03 vars/RedHat_7.yml
-rw-r--r--. 1 nhosoi nhosoi 247 May 15 11:03 vars/RedHat_8.yml
-rw-r--r--. 1 nhosoi nhosoi 247 May 15 11:03 vars/RedHat_9.yml

If we don't create a symlink CentOS_9.yml, tests_backup.yml fails at:

TASK [Verify backup was not done in first, but in second attempt] **************
task path: /home/nhosoi/linux-system-roles/ssh/tests/tests_backup.yml:49
Friday 08 September 2023  17:11:16 -0700 (0:00:00.801)       0:00:10.198 ****** 
fatal: [sut]: FAILED! => {
    "assertion": "new_backup.files != []",
    "changed": false,
    "evaluated_to": false
}
MSG:
Assertion failed

This is caused since these variables are not correctly initialized.

---
# This system supports drop in directory so defaults are adjusted
__ssh_supports_drop_in: true
__ssh_drop_in_name: "00-ansible"

# This default lists the main configuration file defaults
__ssh_defaults:
  Include: /etc/ssh/ssh_config.d/*.conf
richm commented 1 year ago

Another missing CentOS 9 file case: ssh The role has CentOS_* symlinks as follows:

ok - that definitely needs the symlink - probably didn't see this because we don't test with centos9 in baseosci

nhosoi commented 1 year ago

Another missing CentOS 9 file case: ssh The role has CentOS_* symlinks as follows:

ok - that definitely needs the symlink - probably didn't see this because we don't test with centos9 in baseosci

Thank you, @richm. Indeed, ssh was the only one role that needed CentOS_9.yml. I reverted the other roles and updated https://github.com/linux-system-roles/tox-lsr/issues/137#issue-1887959225.

richm commented 11 months ago

@nhosoi Haven't heard from you in a while - hope you are well. What is remaining to do with this issue?

nhosoi commented 11 months ago

@nhosoi Haven't heard from you in a while - hope you are well. What is remaining to do with this issue?

Hi @richm, I'm doing just fine. Thank you!

The purpose of this issue was to run CI tests in an aarch64 container using runcontainers.sh (as much as possible). I had proposed some changes in quite a number of roles. We had a conversation in September, and I thought I responded/fixed my proposal based on it. However, it stopped there. Since many improvements have been made on the system roles, I don't think my proposal works anymore... Probably, as there is no demand for running the CI tests in a container, we should close this issue. Thanks.