Open rocallahan opened 6 years ago
This'd currently be a breaking change I think (at least amongst Cargo's own test suite). In that sense I don't think we can do this backwards compatibly by default, but we can of course provide an option to do so!
The original thinking was that each test binary would be internally parallelized, but that doesn't always have the best utilization as I'm sure you're seeing
This also affects projects having lots of integration tests, as those are not run in parallel since they are treated as separate crates. Simple test case just to demonstrate: https://github.com/henrikno/rust-test-bench/tree/master. I expected it to take around 1 second with sufficient threads.
A workaround would be to concatenate all the tests into one megatest.rs. Or I guess you could use cargo read-manifest
to enumerate them and feed them to xargs/parallel.
cargo read-manifest | jq -r '.targets[] | select ( .kind | index("test")) | .name' | parallel -j10 cargo test --test {}
Yeah, I thought this this the whole idea to run cargo test
against the whole workspace - to run all tests from all crates via one job server that can utilize all cpu cores.
This is become more and more of a bottleneck for us also in our workspace, we have around 25 crates / integration tests and with some tests in all of them which does run really quite slowly. Could refactor everything into a single crate with the tests in it but it is a lot nicer to have the testing code close to the actual code.
I also expected cargo test --workspace
to parallelize all the test running of all the crates, but if that is not possible to have as default, having it as a setting would work
FWIW in our project we long ago moved almost all tests to a single toplevel crate. Not only does that work around this issue, it also works around scaling issues when building lots of test binaries in a project with long crate dependency chains. See https://robert.ocallahan.org/2018/10/problems-scaling-large-multi-crate-rust.html for more information about that.
Thanks, good to know. We were seeing a lot of compilation & linking overhead from small integration test binaries as well slowing things down.
For these type of use cases, having Cargo build all crates with dynamic linking (cc https://github.com/EmbarkStudios/rust-ecosystem/issues/13) that the integration tests would use would be most helpful, and doesn't need a stable ABI. But that is a whole other huge topic so I'll leave it at that, and would still be slower than a single test crate.
This is not terribly difficult to work around by implementing your own test runner for now (I could get it done in 150 lines of python). I feel like this ought to mean its not terribly difficult to implement in cargo either, but there are caveats from my experience in implementing a custom test runner:
--format=json
;(I really want this to happen. I have multiple test suites in the private workspace that have tests running in excess of 5 minutes in my workspace – running the suites in parallel took down testing times from 30+ minutes down to a more reasonable 7 minutes or so)
Another thing: ideally tests should be run alongside building of code in a pipelined fashion, but that complicates presentation even further.
I'd be more than happy to have buffered stdout/stderrs/presentation if it meant that more things happened in parallel. I don't entirely buy the argument that json output is a blocker. It would be nice but I can't see how it is essential? (I have lots of cores now and want to use them)
(Given every test in the tests subdir becomes its own crate and gets run one at a time strongly hints that we should either stop doing that or optimize that use case)
What's MVP for this feature? Is there anything stopping someone providing a PR that adds an extra flag and does this this weekend? It looks like this loop here would need to be changed to be in parallel: https://github.com/rust-lang/cargo/blob/d56b42c549dbb7e7d0f712c51b39400260d114d4/src/cargo/ops/cargo_test.rs#L85
I'd love to test out someone's PR if they've got a free weekend?
(We can leave doc tests as being serial for now. Any improvement on the existing situation is a win)
I'd like to see the ability to serialize each testing binary (the current behavior) to remain available. I'm currently trying to figure out a way to manage access to expensive external resources needed by integration tests. This would be somewhat easier for binaries that run in sequence vs binaries that are running in parallel.
Absolutely - both options should be possible and the default should be serial for now.
In thinking over this, I think a way we could move forward with this
cargo
and the test harness
harness = true
, we use the latest protocolharness = false
, we use the initial protocol. (what we do today)I think we can do this without an Edition.
EDIT:
A concern is that some tests inherently can't run in parallel (as noted above), and changing that would be a breaking change for those projects. I suspect it is relatively rare, but not insignificant.
@ehuss Perhaps that should be addressed by giving individual tests a way to declare “I want to run sequentially vs. other tests” (or more powerfully, “other tests with this specific label”) — not just for paralleled test binaries but also at the #[test]
or test crate level, to solve the entire problem (example 1, example 2 of people caring). Though it would be most pipelineable if the binaries declared this in Cargo.toml
rather than needing to be queried after compilation.
@ehuss the comment you linked said it is not backwards compatible but doesn't provide a reason.
Today, for tests that can't be run in parallel, isn't the solution to
cargo nextest
)--test-threads 1
. To avoid requiring users to update to cargo test -j1
, we could detect --test-threads 1
and run all of the tests on the same jobserver task, forcing them to be serialized.Is there another case I'm missing? For locks, I just realized that I'm assuming jobserver just coordinates threads across processes but this would break down if instead jobserver has us run the code in another process. Is that what is happening?
An example would be to have tests/test1.rs
and tests/test2.rs
that both use some external global resource (like a shared directory, a database, etc.). Today they safely run separately and independently. If they run concurrently, then they could stomp on each other.
For example, I believe in the past that's how Cargo's tests would work (separate integration tests that would share the same directory). If they ran concurrently, they would corrupt each others' files.
It would be possible to add some kind of coordination around those shared resources, but that would require changing those tests or the way people are running them.
Ok, so the concern is with running multiple test binaries at the same time which won't have any way to coordinate.
And yes, multiple binaries using cargo_test
macro will stomp on each other as they use the same tmp dirs
It seems the only change to my proposal we need to make is the changing of the default cargo/libtest protocol field. We'd have the leave the old protocol in place and then change the default on an edition boundary with cargo fix
making it explicit for existing users.
Slightly related, I just created (and I know others has as well) a utility to do testing across features-sets. The utility I've created allows me to do cargo featurex test
and that will run all tests on all feature permutations (that are not excluded by config) on all packages in a workspace.
Currently, this just builds up a set of permutations, and loops through calling cargo test
on each of them, producing the following output:
However, being able to spawn all (or a number of) the tests at once and collect the answers into one output-stream would probably be much more efficient. The compiling part and permutations part is obviously out of scope for this issue, but the cargo/libtest protocol would potentially be very interesting.
Something we'll need to keep in mind is that cargo bench
might benefit from some of the protocol changes but should not manage the parallel tests so the benches can run serially.
Something we'll need to keep in mind is that
cargo bench
might benefit from some of the protocol changes but should not manage the parallel tests so the benches can run serially.
Manage, as in? I think I understand your point but you mean it should be able to manage, then?
The plan is for cargo test
to orchestrate the threads through jobserver. For cargo bench
, that likely won't work because the bench needs exclusive access to the machine, so instead we'll want to only run one bench binary at a time and have that bench binary track how many, if any, threads it creates.
This might also help with rust-lang/rust#48532
I think whatever the resolution is here should probably address setup and teardown to some extent, even if there isn't anything specifically supported by cargo test
. I haven't run into actual conflicts but I think that one of the problems with cargo-nextest
framework is that it launches several instances of each test binary.
I think it's common to call a setup function that relies on a Once
or OnceLock
in each test function that needs some state (at least that's what I do) for things like creating temporary files, spawning docker containers for integration tests, or otherwise setting up machine state. If the binary is launched multiple times, that state setup happens multiple times.
I don't really know how to work around this though without libtest
handling both the setup and parallelization. Iirc I think setup/teardown has been proposed before and either rejected or deferred.
With the exception of this, the way that cargo-nextest
does things is pretty great. I would love if Cargo could also pick up the slow test indication, once parallelization works
@tgross35 cargo's own test suite needs a single process, so we won't leave out that use case.
Hm. Thinking a bit further, what if libtest
had an option to create a plugin-style cdylib
rather than a binary? Each test shared library would get the following API:
// Pretend everything is no_mangle and extern "C",
// and uses a `#[repr(C)]` struct instead of slices
/// Put a display name with a symbol
pub struct FnInfo {
name: &'static str,
symbol: &'static str,
}
pub struct TestConfig {
capture_output: bool
}
pub struct TestResult {
success: bool,
panicked: bool,
duration: Duration,
stdout: Option<Box<[u8]>>,
stderr: Option<Box<[u8]>>,
}
pub static TESTMODULE_VERSION: u16 = 0x0001;
// Call these in order before all tests
pub static SETUP: &[FnInfo] = &[
FnInfo { name: "libtest::global_setup", symbol: "libtest_setup" },
FnInfo { name: "crate::mod::my_setup", symbol: "crate_mod_setup" },
];
// Call these in order after all tests
pub static TEARDOWN: &[FnInfo] = &[
FnInfo { name: "crate::mod::my_teardown", symbol: "crate_mod_teardown" },
FnInfo { name: "libtest::global_teardown", symbol: "libtest_teardown" },
];
// List of all tests to run. Optionally we could separate a display name and symbol name
pub static TESTS: &[FnInfo] = &[
FnInfo { name: "crate::mod::test_foo", symbol: "test_crate_mod_test_foo" },
FnInfo { name: "crate::othermod::test_bar", symbol: "test_crate_othermod_test_bar" },
];
// Provided by libtest
pub fn libtest_setup(TestConfig) -> TestResult { /* ... */ };
pub fn libtest_teardown(TestConfig) -> TestResult { /* ... */ };
// Thin wrapper over the User's tests that captures panics and (optionally) output
pub fn crate_mod_setup(TestConfig) -> TestResult { /* ... */ };
pub fn crate_mod_teardown(TestConfig) -> TestResult { /* ... */ };
pub fn test_crate_mod_test_foo(TestConfig) -> TestResult { /* ... */ };
pub fn test_crate_othermod_test_bar(TestConfig) -> TestResult { /* ... */ };
Then Cargo or any test runner would do the following:
SETUP
, TEARDOWN
and TESTS
, read the names which indicate symbols in the shared librarySETUP
TESTS
and aggregate the results. The test runner can run these in series or parallelTEARDOWN
So any runner can get fine control over each exact test without duplicating setup/teardown work.
It seems like this could be a nicely flexible solution, but I'm sure it has already been discussed somewhere...
Edit: brought this up on zulip to discuss a bit https://rust-lang.zulipchat.com/#narrow/stream/122651-general/topic/.60rustc.20--test.60.20creating.20a.20.60cdylib.60
@tgross35 people have brought this idea up in the past. My biggest concern is this makes a bigger API surface to stabilize.
I think that these sort of things are lower pressure since it is more like a versioned schema than a user-forward API, i.e. you just need to look for the TESTMODULE_VERSION
symbol to make sure you're using the right interfaces. But it sounds like this is unlikely to be the right path forward anyway, per zulip discussion
As we're talking about running test harnesses in parallel, rather than test binaries, I'm unsure how relevant this will be but some people want to be able to identify when test processes are part of the same "run", see https://internals.rust-lang.org/t/introduce-env-var-cargo-run-id-for-identifying-related-processes-threads/21134
Currently if you have a workspace with a lot of crates,
cargo test --all
runs those tests one crate at a time. This is slow if you have a lot more cores than individual crates can saturate. It would speed things up a lot if cargo could run tests for multiple crates in parallel.