`cargo test --all` should run tests in parallel

rocallahan commented 6 years ago

Currently if you have a workspace with a lot of crates, cargo test --all runs those tests one crate at a time. This is slow if you have a lot more cores than individual crates can saturate. It would speed things up a lot if cargo could run tests for multiple crates in parallel.

alexcrichton commented 6 years ago

This'd currently be a breaking change I think (at least amongst Cargo's own test suite). In that sense I don't think we can do this backwards compatibly by default, but we can of course provide an option to do so!

The original thinking was that each test binary would be internally parallelized, but that doesn't always have the best utilization as I'm sure you're seeing

henrikno commented 4 years ago

This also affects projects having lots of integration tests, as those are not run in parallel since they are treated as separate crates. Simple test case just to demonstrate: https://github.com/henrikno/rust-test-bench/tree/master. I expected it to take around 1 second with sufficient threads.

A workaround would be to concatenate all the tests into one megatest.rs. Or I guess you could use cargo read-manifest to enumerate them and feed them to xargs/parallel.

cargo read-manifest | jq -r '.targets[]  | select ( .kind | index("test")) | .name' | parallel -j10 cargo test --test {}

Dushistov commented 4 years ago

Yeah, I thought this this the whole idea to run cargo test against the whole workspace - to run all tests from all crates via one job server that can utilize all cpu cores.

repi commented 4 years ago

This is become more and more of a bottleneck for us also in our workspace, we have around 25 crates / integration tests and with some tests in all of them which does run really quite slowly. Could refactor everything into a single crate with the tests in it but it is a lot nicer to have the testing code close to the actual code.

I also expected cargo test --workspace to parallelize all the test running of all the crates, but if that is not possible to have as default, having it as a setting would work

rocallahan commented 4 years ago

FWIW in our project we long ago moved almost all tests to a single toplevel crate. Not only does that work around this issue, it also works around scaling issues when building lots of test binaries in a project with long crate dependency chains. See https://robert.ocallahan.org/2018/10/problems-scaling-large-multi-crate-rust.html for more information about that.

repi commented 4 years ago

Thanks, good to know. We were seeing a lot of compilation & linking overhead from small integration test binaries as well slowing things down.

For these type of use cases, having Cargo build all crates with dynamic linking (cc https://github.com/EmbarkStudios/rust-ecosystem/issues/13) that the integration tests would use would be most helpful, and doesn't need a stable ABI. But that is a whole other huge topic so I'll leave it at that, and would still be slower than a single test crate.

nagisa commented 3 years ago

This is not terribly difficult to work around by implementing your own test runner for now (I could get it done in 150 lines of python). I feel like this ought to mean its not terribly difficult to implement in cargo either, but there are caveats from my experience in implementing a custom test runner:

I believe cargo currently simply presents the output from the test runner directly. If the test suites were run in parallel, things would get quite jumbled!
- This means cargo should run the test runners with --format=json;
- And present the test output in its own way, much like it currently does for compilation progress.
There are complications with doctests:
- As per usual™;
- Similar considerations as for regular test suites, but rustdoc does not currently support outputting test results as json – feature development there will be necessary.
And custom/missing test harnesses;
- Unclear how to handle those?

nagisa commented 3 years ago

(I really want this to happen. I have multiple test suites in the private workspace that have tests running in excess of 5 minutes in my workspace – running the suites in parallel took down testing times from 30+ minutes down to a more reasonable 7 minutes or so)

nagisa commented 3 years ago

Another thing: ideally tests should be run alongside building of code in a pipelined fashion, but that complicates presentation even further.

gilescope commented 3 years ago

I'd be more than happy to have buffered stdout/stderrs/presentation if it meant that more things happened in parallel. I don't entirely buy the argument that json output is a blocker. It would be nice but I can't see how it is essential? (I have lots of cores now and want to use them)

gilescope commented 3 years ago

(Given every test in the tests subdir becomes its own crate and gets run one at a time strongly hints that we should either stop doing that or optimize that use case)

What's MVP for this feature? Is there anything stopping someone providing a PR that adds an extra flag and does this this weekend? It looks like this loop here would need to be changed to be in parallel: https://github.com/rust-lang/cargo/blob/d56b42c549dbb7e7d0f712c51b39400260d114d4/src/cargo/ops/cargo_test.rs#L85

I'd love to test out someone's PR if they've got a free weekend?

(We can leave doc tests as being serial for now. Any improvement on the existing situation is a win)

webern commented 3 years ago

I'd like to see the ability to serialize each testing binary (the current behavior) to remain available. I'm currently trying to figure out a way to manage access to expensive external resources needed by integration tests. This would be somewhat easier for binaries that run in sequence vs binaries that are running in parallel.

gilescope commented 3 years ago

Absolutely - both options should be possible and the default should be serial for now.

epage commented 1 year ago

In thinking over this, I think a way we could move forward with this

For test targets, add a new field that versions the protocol between cargo and the test harness
- When harness = true, we use the latest protocol
- When harness = false, we use the initial protocol. (what we do today)
We add a new protocol between libtest / cargo modeled after how cargo/rustc interact
- cargo passes jobserver information to libtest
- cargo enables the json output for libtest
- we update libtest's json output to offer the same level of quality of messages as we have today
- we capture all stderr output from libtest and report it verbatim
Harnesses using the new protocol are started immediately after their needed artifacts are compiled, rather than waiting for all compilation to be done to use any downtime, like linking

I think we can do this without an Edition.

EDIT:

2832 could be improved if we take over rendering of the test results

ehuss commented 1 year ago

A concern is that some tests inherently can't run in parallel (as noted above), and changing that would be a breaking change for those projects. I suspect it is relatively rare, but not insignificant.

kpreid commented 1 year ago

@ehuss Perhaps that should be addressed by giving individual tests a way to declare “I want to run sequentially vs. other tests” (or more powerfully, “other tests with this specific label”) — not just for paralleled test binaries but also at the #[test] or test crate level, to solve the entire problem (example 1, example 2 of people caring). Though it would be most pipelineable if the binaries declared this in Cargo.toml rather than needing to be queried after compilation.

epage commented 1 year ago

@ehuss the comment you linked said it is not backwards compatible but doesn't provide a reason.

Today, for tests that can't be run in parallel, isn't the solution to

Put locks in the tests which would still work as we are still running all the tests in the same process (unlike cargo nextest)
Explicitly passing --test-threads 1. To avoid requiring users to update to cargo test -j1, we could detect --test-threads 1 and run all of the tests on the same jobserver task, forcing them to be serialized.

Is there another case I'm missing? For locks, I just realized that I'm assuming jobserver just coordinates threads across processes but this would break down if instead jobserver has us run the code in another process. Is that what is happening?

ehuss commented 1 year ago

An example would be to have tests/test1.rs and tests/test2.rs that both use some external global resource (like a shared directory, a database, etc.). Today they safely run separately and independently. If they run concurrently, then they could stomp on each other.

For example, I believe in the past that's how Cargo's tests would work (separate integration tests that would share the same directory). If they ran concurrently, they would corrupt each others' files.

It would be possible to add some kind of coordination around those shared resources, but that would require changing those tests or the way people are running them.

epage commented 1 year ago

Ok, so the concern is with running multiple test binaries at the same time which won't have any way to coordinate.

And yes, multiple binaries using cargo_test macro will stomp on each other as they use the same tmp dirs

epage commented 1 year ago

It seems the only change to my proposal we need to make is the changing of the default cargo/libtest protocol field. We'd have the leave the old protocol in place and then change the default on an edition boundary with cargo fix making it explicit for existing users.

Alxandr commented 1 year ago

Slightly related, I just created (and I know others has as well) a utility to do testing across features-sets. The utility I've created allows me to do cargo featurex test and that will run all tests on all feature permutations (that are not excluded by config) on all packages in a workspace.

Currently, this just builds up a set of permutations, and loops through calling cargo test on each of them, producing the following output:

However, being able to spawn all (or a number of) the tests at once and collect the answers into one output-stream would probably be much more efficient. The compiling part and permutations part is obviously out of scope for this issue, but the cargo/libtest protocol would potentially be very interesting.

epage commented 1 year ago

Something we'll need to keep in mind is that cargo bench might benefit from some of the protocol changes but should not manage the parallel tests so the benches can run serially.

sanmai-NL commented 1 year ago

Something we'll need to keep in mind is that cargo bench might benefit from some of the protocol changes but should not manage the parallel tests so the benches can run serially.

Manage, as in? I think I understand your point but you mean it should be able to manage, then?

epage commented 1 year ago

The plan is for cargo test to orchestrate the threads through jobserver. For cargo bench, that likely won't work because the bench needs exclusive access to the machine, so instead we'll want to only run one bench binary at a time and have that bench binary track how many, if any, threads it creates.

epage commented 1 year ago

This might also help with rust-lang/rust#48532

tgross35 commented 1 year ago

I think whatever the resolution is here should probably address setup and teardown to some extent, even if there isn't anything specifically supported by cargo test. I haven't run into actual conflicts but I think that one of the problems with cargo-nextest framework is that it launches several instances of each test binary.

I think it's common to call a setup function that relies on a Once or OnceLock in each test function that needs some state (at least that's what I do) for things like creating temporary files, spawning docker containers for integration tests, or otherwise setting up machine state. If the binary is launched multiple times, that state setup happens multiple times.

I don't really know how to work around this though without libtest handling both the setup and parallelization. Iirc I think setup/teardown has been proposed before and either rejected or deferred.

With the exception of this, the way that cargo-nextest does things is pretty great. I would love if Cargo could also pick up the slow test indication, once parallelization works

epage commented 1 year ago

@tgross35 cargo's own test suite needs a single process, so we won't leave out that use case.

tgross35 commented 1 year ago

Hm. Thinking a bit further, what if libtest had an option to create a plugin-style cdylib rather than a binary? Each test shared library would get the following API:

// Pretend everything is no_mangle and extern "C",
// and uses a `#[repr(C)]` struct instead of slices

/// Put a display name with a symbol
pub struct FnInfo {
    name: &'static str,
    symbol: &'static str,
}

pub struct TestConfig {
    capture_output: bool
}

pub struct TestResult {
    success: bool,
    panicked: bool,
    duration: Duration,
    stdout: Option<Box<[u8]>>,
    stderr: Option<Box<[u8]>>,
}

pub static TESTMODULE_VERSION: u16 = 0x0001;

// Call these in order before all tests
pub static SETUP: &[FnInfo] = &[
    FnInfo { name: "libtest::global_setup", symbol: "libtest_setup" },
    FnInfo { name: "crate::mod::my_setup", symbol: "crate_mod_setup" },
];

// Call these in order after all tests
pub static TEARDOWN: &[FnInfo] = &[
    FnInfo { name: "crate::mod::my_teardown", symbol: "crate_mod_teardown" },
    FnInfo { name: "libtest::global_teardown", symbol: "libtest_teardown" },
];

// List of all tests to run. Optionally we could separate a display name and symbol name
pub static TESTS: &[FnInfo] = &[
    FnInfo { name: "crate::mod::test_foo", symbol: "test_crate_mod_test_foo" },
    FnInfo { name: "crate::othermod::test_bar", symbol: "test_crate_othermod_test_bar" },
];

// Provided by libtest
pub fn libtest_setup(TestConfig) -> TestResult { /* ... */ };
pub fn libtest_teardown(TestConfig) -> TestResult { /* ... */ };

// Thin wrapper over the User's tests that captures panics and (optionally) output
pub fn crate_mod_setup(TestConfig) -> TestResult { /* ... */ };
pub fn crate_mod_teardown(TestConfig) -> TestResult { /* ... */ };
pub fn test_crate_mod_test_foo(TestConfig) -> TestResult { /* ... */ };
pub fn test_crate_othermod_test_bar(TestConfig) -> TestResult { /* ... */ };

Then Cargo or any test runner would do the following:

dlopen the test shared library
Locate symbols SETUP, TEARDOWN and TESTS, read the names which indicate symbols in the shared library
Locate and run each symbol in SETUP
Locate each symbol in TESTS and aggregate the results. The test runner can run these in series or parallel
Locate and run each symbol in TEARDOWN
dlclose that test

So any runner can get fine control over each exact test without duplicating setup/teardown work.

It seems like this could be a nicely flexible solution, but I'm sure it has already been discussed somewhere...

Edit: brought this up on zulip to discuss a bit https://rust-lang.zulipchat.com/#narrow/stream/122651-general/topic/.60rustc.20--test.60.20creating.20a.20.60cdylib.60

epage commented 1 year ago

@tgross35 people have brought this idea up in the past. My biggest concern is this makes a bigger API surface to stabilize.

tgross35 commented 1 year ago

I think that these sort of things are lower pressure since it is more like a versioned schema than a user-forward API, i.e. you just need to look for the TESTMODULE_VERSION symbol to make sure you're using the right interfaces. But it sounds like this is unlikely to be the right path forward anyway, per zulip discussion

epage commented 4 months ago

As we're talking about running test harnesses in parallel, rather than test binaries, I'm unsure how relevant this will be but some people want to be able to identify when test processes are part of the same "run", see https://internals.rust-lang.org/t/introduce-env-var-cargo-run-id-for-identifying-related-processes-threads/21134

rust-lang / cargo

`cargo test --all` should run tests in parallel #5609

2832 could be improved if we take over rendering of the test results