Maelstrom is a fast Rust, Go, and Python test runner that runs every test in its own container. Tests are either run locally or distributed to a clustered job runner.
When a test's result is recorded (when the running time is updated), we will record whether or not the test failed.
If multiple instances of a test are run at the same time (with --repeat), and at least one instance fails, then we will record that the test failed.
Tests that failed on their previous test run will always be prioritized before other tests.
If multiple instances of a test are run at the same time (with --repeat), and that test failed last time, then all of the new test instances will get high priority.
Implementation Considerations
I think we'll want to add a "local priority" field to the JobSpec. This priority will eventually only be relevant for jobs from the same client.
When a test previously failed, we'll set the local priority to be higher (or lower?) than the default.
The broker and worker will use a priority queue that is based first on the local priority, and then on the estimated duration.
We might just want a bit per test in the test-listing.toml which indicates if the test failed last time.
We may want to ignore test timings for tests that fail. We don't want test failure times messing up timings for successes.
Acceptance Criteria
--repeat
), and at least one instance fails, then we will record that the test failed.--repeat
), and that test failed last time, then all of the new test instances will get high priority.Implementation Considerations
JobSpec
. This priority will eventually only be relevant for jobs from the same client.test-listing.toml
which indicates if the test failed last time.Definition of Done
CHANGELOG.md
updated.doc/book/head
updated.