os-autoinst / linux-qa

Repository used to coordinate work efforts across linux testing projects
Apache License 2.0
0 stars 0 forks source link

Collect different approaches to provisioning workers (permanent and transient) #2

Open ssssam opened 4 months ago

ssssam commented 4 months ago

Here's how transient workers are currently implemented in GNOME OS.

There are no permanent worker machines set up.

Tests are started from Gitlab CI pipelines. See the .gitlab-ci.yml for openqa-tests.git for how this is implemented. Each Gitlab CI job for QA does the following:

  1. Download the registry.opensuse.org/devel/openqa/containers15.6/openqa_worker:latest image and create a container.
    1. Clone the repo which has the tests (in this case, openqa-tests.git which holds the GNOME OS tests)
    2. Fetch test media. This uses utils/fetch_test_media.sh (a small wrapper around Curl) to download ISO and/or disk image from the S3 bucket where they are stored.
    3. Create /etc/openqa/client.conf with an openqa.gnome.org API key (which is passed in as Gitlab CI variables.
    4. Create a /etc/openqa/worker.conf with a unique worker class. The worker class contains the Gitlab CI pipeline number.
    5. Call /run_openqa_worker.sh (the container entrypoint script) to start the worker daemon, which will connect to the web UI and register itself as a new worker.
    6. Start the jobs. See utils/start_all_jobs.sh helper script, which uses openqa-cli to call the POST isos endpoint. We pass in config/scenario_definitions.yaml, and openQA creates a job for each job_template defined in that file. Finally it parses the JSON response to get a list of job IDs.
    7. Poll the job status of each job. See script utils/wait_for_job.sh. When all jobs have finished or cancelled, exits with 0 (all succeeded) or 1 (something failed or was cancelled).
    8. In all cases, upload some logs as Gitlab CI artifacts, and generate JUnit test report.

Future improvements:

AdamWill commented 4 months ago

In Fedora openQA, everything is deployed via ansible - https://pagure.io/fedora-infra/ansible . worker hosts are defined in the inventory in a few groups - https://pagure.io/fedora-infra/ansible/blob/main/f/inventory/inventory#_348 . There are some significant definitions in the group vars, including firewall config required for tap stuff in https://pagure.io/fedora-infra/ansible/blob/main/f/inventory/group_vars/openqa_tap_workers . The playbook is https://pagure.io/fedora-infra/ansible/blob/main/f/playbooks/groups/openqa-workers.yml and the plays are in https://pagure.io/fedora-infra/ansible/blob/main/f/roles/openqa/worker . Worker hosts are physical machines running plain Fedora (currently 40) and install the relevant packages from Fedora's main repositories, where I maintain them.

We still use a very old-style deployment on Fedora, where the tests live on the server and are shared to the worker hosts via NFS, so checking out tests and so on happens on the server. The ansible plays take care of installing the packages, setting up networking, ensuring the NFS share is configured, enabling the worker instance services and so on.