[FEATURE] Adopt testcontainers for managing integration testing dependencies

joonas commented 1 month ago

Affected project(s)

[ ] documentation
[ ] examples
[X] wasmCloud host
[X] wasmCloud CLI (wash)
[X] wasmCloud dashboard UI (washboard)
[X] capability providers
[ ] provider bindgen
[ ] control interface client
[X] other / not sure

Is your feature request related to a problem? Please describe.

When developing/testing against existing parts of wasmCloud locally, I've found myself running into cases where I am surprised by the setup needed for getting the the tests to run.

Depending on which part of the codebase you're working with, you may need to:

Set up environment variables (nats, minio, redis, vault) to point to specific binaries that will be then executed and cleaned up.
Invoke Docker Compose

(there may be more, these are the two examples I've come across where I was surprised)

In either case, it's not immediately obvious why the tests you tried to run are failing, because the errors you get are not wrapped in our own handling (or better, pre-empted by checking that the dependencies exist before we try to invoke them).

Describe the solution you'd like

Instead of having ambigious dependencies, I would like to propose that as a project we standardize on virtualizing our dependencies by making use of the excellent Testcontainers project (see also Rust impl).

This would have the following benefits of over the status quo:

It would reduce the number of dependencies on the developer's end from N to 1, in this case Docker.
It would allow us to better express (and validate) what version of a given dependency our code was developed against, because could (and should absolutely require this) make it a requirement for each dependency to be pulled in based on a versioned tag (as opposed to allowing latest unless it's absolutely necessary).
It would make it easier to develop more isolated tests that can be run in parallel without having to worry about the underlying service instances conflicting (or being forced to run tests in serial because of the shared environment, whether that's ports being used, locations on disk, configurations of how the service is configured, or otherwise).
It should become easier to introduce new dependencies when they are needed, when developing a new provider for a service that may be onerous to run on your local machine. Some examples that come to mind would be around the JVM ecosystem, whether Kafka, ElasticSearch or anything else really.
It would remove the need to develop custom code to supervise processes and manage the cleanup after tests are done as this will be handled by Testcontainers and/or Docker.

Now, there are of course some downsides to this approach as well:

We're now dependent on Docker being available on the developer machine.
- To put a positive spin on this, we can now write a test to check whether Docker connectivity is available and provide a nice error message to let the developer know what they need to be able to run the tests, and do that only once across the board.
This will require network connectivity for the initial setup to download containers
- This doesn't necessarily feel any different from what's already required in terms of downloading the binaries locally and/or running the Docker Compose setup, but it's worth calling out.
We may need to spend a bit of energy towards making sure we have a working docker environment on CI (if we choose to use this in CI) across the various platforms we support.
- This feels tractable, and there are pre-existing examples of how we could set things up for Mac and Windows (the two platforms that need to be addressed).

I'm sure there's more to cover here, so any feedback on what I might be missing would be helpful.

Describe alternatives you've considered

Stick to the status quo.

Additional context

I believe this is an important consideration from a project perspective, because it is in our interest to do everything we can to lower the barrier of entry for new contributors.

While I don't have any hard data to point you to, drawing from my own experience and anecdotes I've heard from others, the more involved it is to setup your local environment to contribute, the less likely folks are going to be inclined to contribute.

And finally, once folks have gotten over the barrier to contributing, we want to help them test / ensure the quality of their contirbutions locally before pushing things up.

I don't believe that our status quo is newcomer friendly nor necessarily long-term sustainable as the project continues to grow, so setting a good foundation now is important.

I'll also add that @thomastaylor312 pointed out that the oci-distribution project has adopted testcontainers for integration testing setup

cc @rvolosatovs @brooksmtownsend @thomastaylor312

markkovari commented 2 weeks ago

Hey @joonas I started already looking into this. You can assign it to me if you want. Thanks

joonas commented 2 weeks ago

@markkovari that's fantastic, I'd love to have you run with this, but before you jump on it, I think it would be good to hear from @rvolosatovs, @brooksmtownsend and/or @thomastaylor312 on whether we want to move forward this change and if so, whether there are any potential considerations to take into account while doing so.

rvolosatovs commented 2 weeks ago

As far as I know, no host/runtime tests currently require docker, but rather use binaries directly. This has significant benefits, because the exact binary versions are locked in the Nix flake lock, meaning that the development environment is deterministic and consistent across CI and development environment (or at least it can be done so, by choice of using Nix locally). Most importantly, host is tested within the Nix sandbox and because all of these dependencies are deterministically locked in the flake, Nix can only run tests once and cache the result until either the dependencies or implementation changes. Adopting test containers, docker/podman or any other technology, which relies on networking at test runtime for host tests would make this caching impossible, unless we pre-populate the OCI runtime local store with some deterministic artifacts, i.e. we pin versions of these containers to specific content hashes. If we can pin to hashes and propagate this setup into the Nix sandbox - I have no objection to this, however, that seems like a maintenance nightmare, since we'd either need to figure out some automation to update these container pins or do so manually. We already have Nix flake update automation, which updates everything we use for builds, Rust toolchain, any tooling used and all the binary tools we use for testing in a single step.

As another, purely subjective data point, I have recently switched to a Mac from being a long time Linux user and I stopped using OCI runtimes at all, because IMO user experience of using these tools on Mac compared to Linux is abysmal with a highly fragmented ecosystem, which, of course, makes sense, since these tools run in VMs on platforms other than Linux.

My view on this is that the host tests (i.e. tests in tests/*) should be able to use binaries from the environment, locked in the Nix flake, just like they do now. I'm more than happy to provide an alternative or a fallback to an OCI runtime of any sorts if the binary is not found in the environment.

Again, this is for host tests specifically - I don't think "heavy" things like Postgres etc. belong in these tests at all - host tests should run fast and have minimal set of dependencies. Provider integration tests and the like are a different story completely and for those I completely agree to use something based on OCI, which would simplify setup and replicate more closely what users would actually do in production environments. These tests would then be ignored in the Nix sandbox and have separate CI jobs, triggered by changes in relevant paths/files. (this automation is not set up AFAIK)

Basically, what I'm trying to say here is that adopting test-containers or any other OCI-based solution is not a trivial effort, which would require CI automation to come with it - I agree with this change, as long as host tests do not require OCI.

wasmCloud / wasmCloud