Support for setup and teardown scripts

Dzordzu commented 1 year ago

Run code/script before any tests / after all the tests

Usecases

Database creation / Database removal
Loading variables from external sources (ex. Hasicorp Vault)
Setting up / Destroying an external environment

Configuration suggestion/examples

With shell code

[setup]
shell = "some shell code to be executed before ANY test is started"
timeout = "100s"

[cleanup]
shell = "some shell code to be executed after all tests"
timeout = "10s"

With rust code

[setup]
rust = "path/to/setup.rs"
timeout = "10s"

[cleanup]
rust = "path/to/cleanup.rs"
timeout = "15s"

sunshowers commented 1 year ago

Help wanted: Is there anything I've missed here? Use cases and design suggestions would be welcome, especially if you've had practical experience with setup and teardown scripts in other contexts.

I agree, this would be really cool!

I don't think we can do a rust step, but simply providing a list of commands for setup and teardown would be fascinating. I think the way to do it would be to do something like:

[[profile.default.setup]]
# optional, match by test filter and platform similar to overrides
filter = "..."
platform = "..."
command = "foo bar"  # split using shell-words
timeout = "10s"  # optional

[[profile.default.teardown]]
# similar

Notes:

We should probably use the actual shell interpreter for this, similar to git/hg aliases with !. Basically, emulate Python's subprocess.Popen(shell=True). On Unix, use /bin/sh, and on Windows, use %COMSPEC%. (duct_sh can act as a reference, though we may want to just copy what it's doing rather than pull in duct as a dependency)
CWD should be the root of workspace.
Should run setup scripts right before running tests, and teardown scripts right after them.
Shouldn't run setup and teardown scripts if there are no tests to run.
Concatenate the list of scripts to run across selected and default profiles.
There are (to the best of my knowledge) no security implications. Running tests involves running arbitrary code in the user context anyway.

Some issues to work out:

How should the UI be presented? There can be multiple scripts here for different sets of tests. Maybe something like:
```
     Setup [   1/3] <setup script>
...
  Teardown [   1/2] <teardown script>
```
What environment variables should be passed down to setup scripts?
Might need a way to say "only run a script with certain profiles", though maybe that can be solved through other means (e.g. providing scripts a NEXTEST_PROFILE env var that they can filter on).
Do we run teardown scripts on Ctrl-C? Maybe we do on the first Ctrl-C attempt but not on subsequent ones.
Do we add CLI options to skip the setup and teardown scripts?
How should output be presented? Should it just be passed through or captured? Should every line be prefixed with [Setup] or similar?
How should exit codes be handled, particularly for teardown scripts? Seems like we should fail if a setup script fails, but warn if a teardown script fails + exit with non-zero.

sunshowers commented 1 year ago

(Note that I don't plan to work on this unless my employer decides to invest in this.)

sunshowers commented 1 year ago

Some more thoughts based on discussion:

Invoking a Cargo command should be done via $CARGO <command>, not cargo command. This means that we should either build support for interpolating environment variables, or use an actual shell to execute the command (though in that case, what about Windows? 😬)
There's prior art with git/hg aliases and hg hooks we should study.
A thing to think about is whether setup scripts should be executed before tests are listed or after. I think after to start, since that gives us the full information. However I can imagine some custom test harnesses benefiting from running setup scripts before tests are listed.
Can we just do setup scripts at first, and teardown scripts in the future?

Dzordzu commented 1 year ago

We should use env variables without requirement of any shell
I'm not familiar with Mercurial. What kind of 'art' do you mean?
IMHO after tests listing, but I'm not opinionated
Certainly!

sunshowers commented 1 year ago

I'm not familiar with Mercurial. What kind of 'art' do you mean?

"Prior art" is originally a term from patent law. In this context it just means previous attempts to solve similar problems.

Dzordzu commented 1 year ago

Ah, I knew it under the other name (state of the art). Everyday learning something new ;)

sunshowers commented 1 year ago

My employer has decided to invest in this, at least to get it to an experimental stage. Hoping to get an experimental version out in the next 2 weeks.

sunshowers commented 1 year ago

Initial support for setup scripts is in #977.

sunshowers commented 1 year ago

OK, just released experimental support for setup scripts with nextest 0.9.59. Documentation is at https://nexte.st/book/setup-scripts.html.

@Dzordzu @alextes @ns-sjorgedeaguiar (since you liked the post) and others, please try it out and provide feedback in the tracking issue #978!

Veetaha commented 11 months ago

Do we run teardown scripts on Ctrl-C? Maybe we do on the first Ctrl-C attempt but not on subsequent ones.

I'm for running teardown scripts regardless if the setup script succeded or not and regardless if it was cancelled or not. The docs should be very explicit about teardown scripts being idempotent, meaning they should be prepared to be executed several times, and even if the setup script didn't run to completion, and the teardown script should not fail if the resource it needs to cleanup already was deleted (by the prev. run of the teardown script, for example).

How should output be presented? Should it just be passed through or captured? Should every line be prefixed with [Setup] or similar?

For me, it's fine to leave the output uncaptured, and no prefixing is required. This would be relevant if there was an ability to run several setup scripts in parallel. I'm not sure nextest needs that level of complexity, but time will tell.

Do we add CLI options to skip the setup and teardown scripts?

Right now, I personally don't have such a use case. It may make sense to wait until there is one?

How should exit codes be handled, particularly for teardown scripts? Seems like we should fail if a setup script fails, but warn if a teardown script fails + exit with non-zero.

I guess this should be configurable for teardown scripts at least. For example, if the teardown needs to shutdown a local docker container and it fails to do that then the testing should be considered "partially successful", a warning should be printed and the exit status should be zero, because the cost of leaking a local docker container isn't high, especially when the docker daemon is shutdown on an ephemeral CI runner after the the tests anyway.

However, there may be cases like running terraform destroy to delete an expensive EC2 instance in AWS after the tests are done. If such teardown fails users should know about it immediately. An error-level log and a non-zero exit status will be a good primitive way of dealing with that.

So I suggest that by default, nextest exists with non-zero and an error log with the teardown script fails, but there should be some config knob like

[script.errors]
# If exit code is not specified this means the following action is performed for all non-zero exit codes
action = "warn"

# Override handling of a specific exit code
[script.errors]
code = 255
action = "fail"

Several more things to add. It usually makes sense to run the teardown script right before the setup script to ensure the environment is clean before the setup, otherwise the setup would try to create the resource that already exists.

It should be easy to implement this for the users by moving their scripts to a file and invoking the teardown script at the beginning of the setup script. Therefore I'm not sure nextest needs a config-level capability to configure script dependencies such that the setup script would depend on the teardown script. . but it's something to keep in mind.

It makes sense to have retry configs for the scripts just like for tests. For example, we do terraform apply during the tests setup (currently inside of each test in Rust) and that may fail in many spectacular ways, so we retry it in code.

What about splitting before_all and before_each scripts? For example, all tests may depend on one docker container that would be created just ones in before_all, but what if every script needs an isolated resource for their own as well? Having such setup/teardown logic outside of the process will be more robust because we are guaranteed the resources in before_each will be teardown even if the tests abort or they don't properly handle panics/early returns.

serle commented 9 months ago

It would be great to support before_all, before_each, after_each and after_all pattern in the same way that the jest runner does. However generally in that runner the before_all/before_each is able to initialise some context that can then be used in the test functions. In the case of nextest there does not seem to be a way to access any context set by the runner itself.

sunshowers commented 9 months ago

With setup scripts, the main way to pass in context is via environment variables.

before_each and after_each aren't really supportable with the current way nextest works. Most people either initialize that context by hand or use a proc macro for that.

jayvdb commented 6 months ago

Hi, thanks for adding this experimental feature which I was able to use to solve the problem described at https://github.com/nextest-rs/nextest/issues/1466.

I have a script

#!/bin/bash

# Exit with 1 if NEXTEST_ENV isn't defined.
if [ -z "$NEXTEST_ENV" ]; then
    exit 1
fi

# Exit with 1 if $1 isnt set.
if [ -z "$1" ]; then
    exit 1
fi

# Write out an environment variable to $NEXTEST_ENV.
echo "RUST_MIN_STACK=$1" >> "$NEXTEST_ENV"

And then a bunch of limits added

# Default
[script.add-stack-limit-80000]
command = "scripts/add-stack-limit.sh 80000"

[[profile.default.scripts]]
filter = "all()"
setup = "add-stack-limit-80000"

[script.add-stack-limit-700000]
command = "scripts/add-stack-limit.sh 700000"

[[profile.default.scripts]]
filter = "package(some-expensive-lib)"
setup = "add-stack-limit-700000"

As I have about 12 [script.add-stack-limit-xxxxx], it would be nice if setup = "add-stack-limit-700000" could be written as setup = ["add-stack-limit", "700000"]

nextest-rs / nextest