Create new commands that would build the docker images and run the e2e tests separately

tatilepizs commented 2 weeks ago

Describe the issue When we are creating new e2e tests or debugging the existing ones (and not modifying the cht-core code), it is really frustrating that every time we run the e2e tests we need to download and build completely the docker images. It takes around 2 minutes for a test to start running, which is a lot of time when we are debugging, for example, and just need to try small things in each iteration.

Describe the improvement you'd like We would like to create two new commands, one that builds the docker images and one that only executes the wdio e2e tests.

The first command, the one that build the images, would be in charge of deleting always the images associated at the branch (if they exist) and then creating them. if we remove the ${buildTime} from this line we can use the same images every time that we run the tests using the same branch, and by deleting the images at the beginning we would make sure that we are using the correct images, in case something changes in the cht-core code. The second command would be in charge of only executing the e2e tests, and since everything will be already built, it should just take seconds to start seeing the execution of the test.

We will also leave the command wdio-local that would build and execute, just as it is right now.

Describe alternatives you've considered

tatilepizs commented 2 weeks ago

@dianabarsan, I would love to hear your thoughts about this idea. Do you think that the approach that I am describing is feasible? Is there something else that I should consider?

cc: @jkuester

dianabarsan commented 2 weeks ago

I think I already had a branch that did this. I'll share.

dianabarsan commented 1 week ago

Ok, it's not a branch, because you don't need new commands.

All you need to do is:

run local tests so images get created
open tests/cht-core-test.yml and see the image tag. it's going to look like image: medicmobile/cht-haproxy:4.14.0-dev.1731328820240. Take the tag and export it as VERSION. in my example: export VERISON=4.14.0-dev.1731328820240
run the CI command for the same test, instead of the local one (so, instead of npm run wdio-local you would run npm run ci-webdriver-default

jkuester commented 1 week ago

@dianabarsan is there a compelling reason that we need to timestamp the image version? The main benefit I can think of is that it helps ensure we don't locally run the e2e tests against a stale version of the docker image. Is there something else I am missing?

The downsides of timestamping the image version include:

As a developer, it is just continuously pumping new docker images onto my local system. I assume the docker layering allows these near-identical images to exist with very little extra disk space, but it is still annoying.
For QA engineers (or anyone else trying to re-run the e2e tests without rebuilding images), it requires the manual step of checking the current image version and then setting the VERSION envar.

IMHO, re-running the e2e tests without rebuilding the images should be a workflow with top-level support. As a developer I am frequently re-running tests while writing them. It would be easier to script this if the image VERSION was deterministic (and not time-stamped). However, even if we keep the time-stamping, I still think it would be valuable to have an npm script that would programatically set the VERSION when necessary.

dianabarsan commented 1 week ago

re-running the e2e tests without rebuilding the images should be a workflow with top-level support

Sure, I agree.

If we neither timestamp the images NOR rebuild the images every time we run tests, then the risk of running tests over unknown images is really high. How do you even know what's in the image? Do you start the container and inspect it to see what changes are in it? Especially if you run multiple git worktrees and run tests in parallel.

This is one argument for timestamping, that if one uses different worktrees (like me), then you will have different images for them - which you can reuse with the VERSION env. Now, the worktrees don't necessarily need a timestamp (though this is most correct), we could use a hash of the local folder path instead to uniquely identify the reusable worktree images.

It's a shame there is no "only build if changed" docker command :(

I still think it would be valuable to have an npm script that would programatically set the VERSION when necessary.

What can be done is:

set the VERSION env variable
run the local e2e test that builds images. because of the VERSION env variable you set before, the image name will be deterministic.
for subsequent tests, run the ci e2e test commands.

export VERSION=mytest
npm run integration-all-local
npm run ci-integration-all
npm run ci-integration-all
npm run ci-integration-all
npm run ci-integration-all
npm run ci-integration-all

This way you don't need to fish around get timestamps from generated docker compose files.

dianabarsan commented 1 week ago

What if .... instead of timestamps we actually used the local checked out branch name? That way the version is deterministic, we can include it in npm scripts and it will also work with different worktrees or when working on multiple branches in parallel?

VERSION=$(git branch --show-current) npm run integration-all-local
VERSION=$(git branch --show-current) npm run ci-integration-all

And these can be our npm commands for local runs.

jkuester commented 1 week ago

If we neither timestamp the images NOR rebuild the images every time we run tests, then the risk of running tests over unknown images is really high.

Agreed. My vote would be that the behavior of wdio-local (and similar commands) remains the same, where the images are rebuilt for each run. Then, we can just use the ci-webdriver-default commands to run the tests without rebuilding images (or add new commands if there is some reason the ci ones will not work).

Ultimately it is up to the developer to be aware of the current state of their repo/images, but this seems to offer a good balance of a safe default happy path where everything gets built while allowing for a shortcut when someone knows what they are doing....

What if .... instead of timestamps we actually used the local checked out branch name?

This is my thought exactly! Specifically, my proposal is that we update the get-version.js script code so that the default version value is the current git branch name. Then, we update the ci test commands to include:

    "ci-webdriver-default": "export VERSION=${VERSION:-$(node ./scripts/build/get-version.js)} wdio run ./tests/e2e/default/wdio.conf.js",

This way, when the VERSION envar is provided to the ci commands (e.g. by the GitHub actions) that value will be used. But, when there is no VERSION, it will call the get-version script and use the default value (aka the branch name).

@tatilepizs what are your thoughts here?

tatilepizs commented 1 week ago

Thank you both for this discussion, I like the idea of using the branch name for the images, that would help a lot. But, I was trying the steps that Diana suggested here and they are not working for me, after I run npm run ci-webdriver-default it gets retrying and never run the tests, it just shows this message:

API check failed, trying again in 1 second
AggregateError
Checking API

Here is the video of the steps:

video

@dianabarsan is there something I am missing?

dianabarsan commented 1 week ago

It looks like your docker is trying to pull the images and fails? Can you please post logs @tatilepizs ? It's easier to follow than a video. I wonder if this is related to docker desktop.

tatilepizs commented 1 week ago

This is all I can see:

npm run ci-webdriver-default

> medic@4.14.0 ci-webdriver-default
> wdio run ./tests/e2e/default/wdio.conf.js

Execution of 1 workers started at 2024-11-13T22:51:30.751Z

(node:12628) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
time="2024-11-13T16:51:32-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."

time="2024-11-13T16:51:32-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."
time="2024-11-13T16:51:32-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."

 Volume cht-e2e_cht-ssl  Removing

 Volume cht-e2e_cht-credentials  Removing

 Volume cht-e2e_cht-ssl  Removed
 Volume cht-e2e_cht-credentials  Removed

time="2024-11-13T16:51:32-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."

time="2024-11-13T16:51:32-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."
time="2024-11-13T16:51:32-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."

 nginx Pulling 
 api Pulling 
 couchdb-2.local Pulling 
 couchdb-3.local Pulling 

 couchdb-1.local Pulling 
 healthcheck Pulling 
 sentinel Pulling 
 haproxy Pulling 

 sentinel Error manifest for medicmobile/cht-sentinel:4.14.0-dev.1731538290620 not found: manifest unknown: manifest unknown

 haproxy Error context canceled
 nginx Error context canceled
 healthcheck Error context canceled
 couchdb-1.local Error context canceled
 api Error context canceled
 couchdb-2.local Error context canceled

 couchdb-3.local Error context canceled

Error response from daemon: manifest for medicmobile/cht-sentinel:4.14.0-dev.1731538290620 not found: manifest unknown: manifest unknown

time="2024-11-13T16:51:34-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."

time="2024-11-13T16:51:34-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."

time="2024-11-13T16:51:34-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."

Checking API
(node:12628) Warning: Setting the NODE_TLS_REJECT_UNAUTHORIZED environment variable to '0' makes TLS connections and HTTPS requests insecure by disabling certificate verification.
API check failed, trying again in 1 second
AggregateError
Checking API
API check failed, trying again in 1 second
AggregateError
Checking API
API check failed, trying again in 1 second
...

dianabarsan commented 1 week ago

Oh, I think I saw what the problem was, in the video.

Your exported variable is "VERISON" not "VERSION" :)

dianabarsan commented 1 week ago

Excerpt from the video.

tatilepizs commented 1 week ago

Oh no 🤦 I am sorry about this. Thanks Diana

dianabarsan commented 1 week ago

Did it work without the typo?

tatilepizs commented 6 days ago

Yes, it didn't download the images, thank you 😊

I was thinking if there is a way to leave the containers up and running the first time, so the next time, it just uses the containers that are already up and doesn't need to create the containers again, that would be awesome because the tests will be running in just a couple of seconds

dianabarsan commented 6 days ago

@tatilepizs I used to leave containers running for further testing, but @jkuester pushed a commit that killed them :)

https://github.com/medic/cht-core/commit/cf4a1dde1fe2cca28f201c65f1b70bcd2dda6f6b

tatilepizs commented 6 days ago

😅 If I cancel the process, I will have the containers up, but when I run the command npm run ci-webdriver-default it will stop and remove all the containers and then create them again

 npm run ci-webdriver-default             

> medic@4.14.0 ci-webdriver-default
> wdio run ./tests/e2e/default/wdio.conf.js

Execution of 1 workers started at 2024-11-15T17:16:40.386Z

(node:91316) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
time="2024-11-15T11:16:42-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."
time="2024-11-15T11:16:42-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."
time="2024-11-15T11:16:42-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."

 Container cht-e2e-healthcheck-1  Stopping
 Container cht-e2e-couchdb-2.local-1  Stopping
 Container cht-e2e-sentinel-1  Stopping
 Container cht-e2e-couchdb-3.local-1  Stopping
 Container cht-e2e-couchdb-1.local-1  Stopping
 Container cht-e2e-nginx-1  Stopping
 Container cht-e2e-sentinel-1  Stopped
 Container cht-e2e-sentinel-1  Removing
 Container cht-e2e-sentinel-1  Removed
 Container cht-e2e-healthcheck-1  Stopped
 Container cht-e2e-healthcheck-1  Removing
 Container cht-e2e-healthcheck-1  Removed
 Container cht-e2e-couchdb-2.local-1  Stopped
 Container cht-e2e-couchdb-2.local-1  Removing
 Container cht-e2e-couchdb-2.local-1  Removed
 Container cht-e2e-couchdb-1.local-1  Stopped
 Container cht-e2e-couchdb-1.local-1  Removing
 Container cht-e2e-couchdb-1.local-1  Removed
 Container cht-e2e-couchdb-3.local-1  Stopped
 Container cht-e2e-couchdb-3.local-1  Removing
 Container cht-e2e-couchdb-3.local-1  Removed
 Container cht-e2e-nginx-1  Stopped
 Container cht-e2e-nginx-1  Removing
 Container cht-e2e-nginx-1  Removed
 Container cht-e2e-api-1  Stopping
 Container cht-e2e-api-1  Stopped
 Container cht-e2e-api-1  Removing
 Container cht-e2e-api-1  Removed
 Container cht-e2e-haproxy-1  Stopping
 Container cht-e2e-haproxy-1  Stopped
 Container cht-e2e-haproxy-1  Removing
 Container cht-e2e-haproxy-1  Removed
 Volume cht-e2e_cht-ssl  Removing
 Volume cht-e2e_cht-credentials  Removing
 Network cht-net-e2e  Removing
 Volume cht-e2e_cht-credentials  Removed
 Volume cht-e2e_cht-ssl  Removed
 Network cht-net-e2e  Removed

time="2024-11-15T11:16:47-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."
time="2024-11-15T11:16:47-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."
time="2024-11-15T11:16:47-06:00" level=warning msg="The \"COUCHDB_UUID\" variable is not set. Defaulting to a blank string."

 Network cht-net-e2e  Creating
 Network cht-net-e2e  Created
 Volume "cht-e2e_cht-ssl"  Creating
 Volume "cht-e2e_cht-ssl"  Created
 Volume "cht-e2e_cht-credentials"  Creating
 Volume "cht-e2e_cht-credentials"  Created
 Container cht-e2e-haproxy-1  Creating
 Container cht-e2e-couchdb-1.local-1  Creating
 Container cht-e2e-healthcheck-1  Creating
 Container cht-e2e-couchdb-2.local-1  Creating
 Container cht-e2e-couchdb-3.local-1  Creating
 Container cht-e2e-haproxy-1  Created
 Container cht-e2e-sentinel-1  Creating
 Container cht-e2e-api-1  Creating
 Container cht-e2e-healthcheck-1  Created
 Container cht-e2e-couchdb-2.local-1  Created
 Container cht-e2e-couchdb-3.local-1  Created
 Container cht-e2e-couchdb-1.local-1  Created
 Container cht-e2e-api-1  Created
 Container cht-e2e-nginx-1  Creating
 Container cht-e2e-sentinel-1  Created
 Container cht-e2e-nginx-1  Created
 Container cht-e2e-couchdb-2.local-1  Starting
 Container cht-e2e-couchdb-1.local-1  Starting
 Container cht-e2e-healthcheck-1  Starting
 Container cht-e2e-couchdb-3.local-1  Starting
 Container cht-e2e-haproxy-1  Starting
 Container cht-e2e-healthcheck-1  Started
 Container cht-e2e-couchdb-2.local-1  Started
 Container cht-e2e-couchdb-1.local-1  Started
 Container cht-e2e-haproxy-1  Started
 Container cht-e2e-sentinel-1  Starting
 Container cht-e2e-api-1  Starting
 Container cht-e2e-couchdb-3.local-1  Started
 Container cht-e2e-sentinel-1  Started
 Container cht-e2e-api-1  Started
 Container cht-e2e-nginx-1  Starting
 Container cht-e2e-nginx-1  Started
...

jkuester commented 6 days ago

@jkuester pushed a commit that killed them :)

😅 💀

I was thinking if there is a way to leave the containers up and running the first time, so the next time, it just uses the containers that are already up and doesn't need to create the containers again

This is where I would draw the line and say it is not a good idea to add first-class support for this workflow! 😅 IMHO, you need to drop all the containers and reset things between test runs to better guarantee the reproducibility of test results. The docker images are static and cannot be affected by things that happen during a test run. But the containers are not! It might be possible (or likely?) that the state of the CHT instance in the containers is slightly different after a test run. Then subsequent test runs would not be running with the exact same environment as a fresh run.

tatilepizs commented 6 days ago

🤔 hmm, you are right Josh, I didn't think about that

dianabarsan commented 4 days ago

@tatilepizs are you happy with the :

VERSION=$(git branch --show-current) npm run integration-all-local
VERSION=$(git branch --show-current) npm run ci-integration-all

solution?

tatilepizs commented 4 days ago

Yes, I think it will be a great improvement to the way that we execute the tests

medic / cht-core

Create new commands that would build the docker images and run the e2e tests separately #9630