spantaleev / matrix-docker-ansible-deploy

🐳 Matrix (An open network for secure, decentralized communication) server setup using Ansible and Docker
GNU Affero General Public License v3.0
4.9k stars 1.04k forks source link

How are you guys testing this? #1585

Open pratikbin opened 2 years ago

pratikbin commented 2 years ago

Thank you for awesome project.

One question though, Are you guys are testing Ansible in CI or anywhere before/after merging in master/main?

spantaleev commented 2 years ago

Right now, we're not doing any automated testing.

We used to have some Ansible linting CI job (#1469, #1471), but it was unstable and not very useful, so I've gotten rid of it (ac6049516632a15db).

It would probably be nice to have some functional test suite running against some throw-away VM or something, to ensure the playbook is not completely broken. If one uses self-signed certificates (matrix_ssl_retrieval_method: self-signed), it could probably be done without a public hostname and without hitting Let's Encrypt rate-limits, etc.

pratikbin commented 2 years ago

Okay, I'm thinking about utilizing vagrant. Testing in local with it, will post you here

Marwel commented 2 years ago

Hi, i'm also interested in this. I already made a pull request for yamllint action and cleanup. linting per se is not really a test, but it can be helpful to keep the codebase clean.

@spantaleev What was wrong with ansible lint?

I would like to help on this topic. Another suggestion on real tests would be molecule, maybe in combination with vagrant.

spantaleev commented 2 years ago

ansible-lint broke for unknown reasons. Here's one PR where you could see it failing: https://github.com/spantaleev/matrix-docker-ansible-deploy/pull/1521

It also wasn't terribly helpful, so it didn't seem like something to bother over. I suppose the way we were calling that action (see ac6049516632a15d) was fragile for some reason. We're pinning versions there, but that doesn't seem enough to give us a stable ansible-lint installation. Some of its version constraints may be unbound or something.

Marwel commented 2 years ago

Ah ok, the install of ansible-lint broke. Interesting. Ok, there are not that many actions that provide ansible-lint. If you're not that interested in ansible-lint, what about ansible syntax checking?

spantaleev commented 2 years ago

I'm fine with having lint, syntax checking, etc.

The problem I have with them is that:

Well, it will do something of value when the syntax finally breaks after merging some PR or something. This seems rare (hasn't happened) in comparison to the CI process being fragile and failing with a false-positive.

We could always give such a thing another go though.

Marwel commented 2 years ago

@pratikbalar Do you have something to show? I would like to discuss further steps to avoid wasting time on repeating the same work.

pratikbin commented 2 years ago

@Marwel I think ansible-lint is adequate for now.

Earlier, I was thinking of testing it with vagrant, but how to test various parameters/scenarios and how to verify it is not clear yet.

Marwel commented 2 years ago

For ansible-lint, we have to find a suitable and reliable action.

Furthermore, i would like to discuss on testing at least one scenario like Debian 11, running setup.yaml with default vars and some extra, so that we don't use lets encrypt in the start. If anyone has any idea on how to test with LE(staging), i'll gladly take it.

When this default scenario is done, we can expand onto other scenarios like different Distros/Versions and also testing some edge cases, which are not in the defaults.

IMO defaults should be first scenario, because most people are going to start on that.

spantaleev commented 2 years ago

Earlier, I was thinking of testing it with vagrant, but how to test various parameters/scenarios and how to verify it is not clear yet.

It would be great if we can spin up a VM, then do a --tags=setup-all,start against it and ensure this command doesn't fail (due to syntax or other logical problems). It may run --tags=self-check as well, somehow, confirming that it's a working server.

To prevent hitting Let's Encrypt rate limits, it should either use the Let's Encrypt sandbox (matrix_ssl_lets_encrypt_staging: true) or just use self-signed certificates (matrix_ssl_retrieval_method: self-signed). The former is more close to an actual deployment, while the latter is quicker and more independent (and likely good enough?).

We could have multiple tests:

Running the full test-matrix (all supported distros all architectures all tests) will be very heavy and may even make us hit some Docker Hub rate limits, so we should be careful with that.


I agree that starting with just 1 distro and 1 architecture and some relatively-default variables will be good enough for a start. It will confirm that the playbook syntax is correct and that it yields a working Matrix server.

Most of what we do is not distro-specific, so testing on a bunch of distros likely just wastes time and resources for little benefit. Yes, we do have the occasional breakage like #1610, but it's not that common.

Marwel commented 2 years ago

Testing on a matrix style(distros/versions,arch,vars) should not be done on every push/PR, but maybe on a cron ie. daily/weekly basis. testing on the default on the other hand should be done on every push/PR.

@spantaleev Regarding the default, what do you prefer? Do you have a good guess, what the base is? Distro Debian or Ubuntu? Which version? arch should be amd64.

The vars section should be another discussion later on, when we have something, that we can rely on, because vars will need more details.

spantaleev commented 2 years ago

Screenshot_20220210_163229

I did a poll in our #matrix-docker-ansible-deploy:devture.com Matrix room for 4.5 hours and here are the results.

Looks like Ubuntu and Debian are tied for the 1st place, at least among the poll responders so far. We surely have users on Archlinux and SUSE, etc., but these results are likely representative enough.

Perhaps the latest LTS Ubuntu release (currently 20.04 / focal) is a good choice.

skepticalwaves commented 2 years ago

Since you're polling, I suggest a secondary poll of the arch people are running.

luilegeant commented 2 years ago

Hello, here is a past experience to add to the bucket of "false-positive" annoyances. About the certificate (TLS): short story: there isn't just LE rate limiting to take into account as potential success/failure scenario.

Longer story: I did hit LE's rate limit in production a few weeks ago, before I switched to their staging environment to debug the situation. And even then, it took me some time and community support to figure out that my dns update hadn't properly reached their systems (despite online tools showing global the update done). Jumping to the end of the story: the error message I received from LE didn't match my sides of the logs (validation route unreachable, when my reverse proxy logs were showing a couple of http 200 success codes).

I would recommend to be able to run the pipeline in complete isolated mode (keeping the possibility to have a full integration test suite as the project grows that everyone could run). This would also bring the opportunity for one to run all test locally (and messing things up at its own pace *looking at myself in the mirror) before marking a PR as ready (or as learning/test environment).

Also, thanks a lot to all for this project :+1:

And to answer the question titling the ticket: I have an old macmini (intell) with ubuntu server as test platform that gets re-installed every month or so.

spantaleev commented 2 years ago

Yes, I also think it's good to be able to test things locally, without having to push and wait for some magic to test things for you.

In the spirit of this playbook, it'd be best if testing tools can run in a container.

make ansible-lint and it spawns some local container that runs Ansible Lint.

make yaml-lint and it spawns some local container that runs YAML Lint.

make lint and it spawns all lint tasks.

Those Github Actions will then ideally become invocations of these make commands.


About Let's Encrypt, I don't think we can use their staging environment. Using it would still require that the throwaway VM server we spawn and run the playbook against is publicly accessible, so that Let's Encrypt can reach it and see if it's serving the correct challenge files.

So I think we need to go for self-signed certificates.

pratikbin commented 2 years ago

Running the full test-matrix (all supported distros all architectures all tests) will be very heavy and may even make us hit some Docker Hub rate limits, so we should be careful with that.

We can use docker registry cache. I was testing vagrant with it, and it was working fine. But in this case, if we're going to run different vagrant boxes in separate CI pipelines, then it's not use.

Marwel commented 2 years ago

Ok, molecule seems to take a lot of time to configure this the right way. First, it is designed for roles to be tested. This would just mean, that we would test each role in this repo. Tried it on awx-base and first problem i had is copying the host_vars.yml to the container. Molecule creates the containers and manages its own inventory. I tried vars_files, but to no success up to now. I'll have to research, how to handle vars_files with molecule.

I also tried to run the playbook, but that turned out to be less feasible. Importing a playbook with tags seems to be another load of problems.