Closed Centril closed 2 years ago
On stdout/err capture (after doing some research around the proof-of-concept #50457):
The current state is sub-optimal, to say the least. Both stdout
and stderr
exist as thread-local handles that only the stdlib has access to, which print
family of macros use, as does panic
.
E
)PrintWriter
- feel free to use these, I am the author and release them under MIT/UnlicenseTL;DR:
To have sane capture behavior for stdout and stderr for print
-family macros, panic
, and io::{stdout, stderr}
, running each test in its own (child) process is probably required. I'd love to help out wherever I can, but lack confidence.
The only solution I see at this time that maintains current behavior of separate captures per test run is to run them all in their own process, and capture stdout that way.
This would be significantly slower on certain platforms where process creation is slow, notably Windows. Especially when the tests are all very quick this will cause severe regressions in total time.
Another possibility for process separation of tests is to spawn a process for each unit of parallelism, but each process is reused for multiple tests.
Of course, we could also just promote the thread-local replaceable stdout/err to io::Stdout
/Err
, but that would be suboptimal itself.
The issue is that stdout is, as far as I can tell, a process-local resource. To capture io::Stdout
we either need our own stdout replacement machinery on top or to separate processes.
if I want to argue about this do I argue about it here?
so like I made hexchat-plugin
crate. the problem is you can't test hexchat side without hexchat. so I'd need some sort of #[hexchat_test]
.
but I still want the users to be able to use plain old vanilla rust #[test]
s. I shouldn't have to reimplement/replace the whole testing system just to be able to produce some hexchat-based unit tests (in addition to normal tests).
(disclaimer: I don't fully understand the RFC, but it seems to require either replacing everything, or not having tests)
@SoniEx2 this RFC is about how we'd like to change the underlying mechanism that is used to arrange testing in Rust so that authors can write new test frameworks (e.g., quickcheck) that have access to the same mechanisms as existing built-in tests ( able to access private functions, can find all tagged functions, etc.). The current proposal is relatively wholesale (you change the test framework for a whole crate), but this is something that may be improved upon. We decided against more sophisticated designs in the initial implementation phase (this is just an "experimental" RFC) so that we don't bite off more than we can chew, but I think down the line we'd want more fine-grained test framework selection too.
As for your specific example, you don't actually need custom frameworks for that feature :) It seems like you want to only run a test if hexchat isn't running? To do that, you could either mark the test with #[allow_fail]
, or you could have the test just return
if it fails to connect. If you wanted a more fine-grained approach, you could write a procedural macro that re-writes any function annotated with #[hexchat_test]
into a #[test]
that returns if hexchat isn't running.
no you don't understand, they have to run inside hexchat. loaded with dlopen (or equivalent). after plugin registration.
Hey all, I've come up with a simplified proposal and I have an implementation now (though it needs a little cleaning up before a PR).
https://blog.jrenner.net/rust/testing/2018/08/06/custom-test-framework-prop.html
Lemme know what you think!
I've thought some more about stdin/stderr capture. Here's a draft of a system that should be able to "perfectly" capture stdin/stdout, though not without O(1) (once per test suite) overhead:
cargo test
. It does no actual testing; its job is to coordinate child processes to do the actual tests.@djrenren finally had a chance to read your proposal. I'm excited by the progress on this, though your write-up did raise some questions for me:
const
, static
, or fn
.". Further down, when you discuss the use-case of a framework that supports test suites, you show a #[test_suite]
annotation on a mod
with #[suite_member]
on the nested tests. It'd be good to show how that expansion works. Does a proc macro turn #[test_suite]
into a const
with an element for each #[suite_member]
? How does it find all of them? Are the #[suite_member]
s all turned into #[test_case]
? How do you avoid double-executing those test cases (once as a "child" and once due to the #[test-case]
annotation)?#![test_runner]
you say "The type of the function must be Fn(&[&mut Foo]) -> impl Termination
for some Foo
which is the test type", yet further down you write a Fn(&[&dyn Foo])
. I assume the latter is the correct version?TestDescAndFn
sufficient for our purposes here? I don't think that can (currently) wrap a struct
or const
in a meaningful way?I think in most cases the CUT won't need to change itself to integrate with the IDE test runner, so long as the IDE understands the test runner you're working with. As I understand how JetBrains' IntelliJ IDEA integrates with both JUnit and Rust's libtest is that it communicates using the standard API and stdin/out. JUnit's integration is obviously tighter, as IDEA was a Java IDE first, but the integration with libtest tests in Rust is already quite good even without any specific integration required.
Obviously the IDE needs to understand how the test runner works or the test runner needs to understand how the IDE works, but under most cases of using a popular test runner, the CUT shouldn't need to change to be tested with in-IDE functionality.
I'm gonna assume this isn't for me, as a crate author, but for IDE developers.
Let me know when crate authors are part of the plan.
@jonhoo
Early on, you say that...
I don't want to get too bikesheddy on how that particular macro would work but basically we'd just produce a single const that had the #[test_case]
annotation. The children would be aggregated inside it.
I assume the latter is the correct version?
Fn(&[&Foo]) -> ()
is a subtype Fn(&[&mut Foo]) -> impl Termination
. It's not required that Foo be a Trait but if it is, then you need dyn
, so those two lines aren't in contradiction.
Is TestDescAndFn sufficient for our purposes here?
I agree TestDescAndFn
is not sufficient, though it could successfully be constructed from all sorts of things with the use of proc macros. I think for libtest we should just make a trait that essentially mirrors TestDescAndFn
and accept that.
One of the earlier RFCs had a section on separating the test harness from the test executor, with the former controlling output and the latter how tests are written and executed.
This is still entirely possible under the current design, it just requires the test runner to be designed to be pluggable.
@SoniEx2 It is for crate authors! 😄
For most crate authors (unless you're authoring test machinery), you'll just import any macros you want for declaring tests or benchmarks, and then you'll pick a test runner. For most cases the built-in one should be fine though.
but how do you run tests inside a hexchat instance, considering the whole hexchat plugin loading system and stuff?
I mean you can't (easily) run hexchat (a GUI-based networked program) in cargo, can you?
can I assume this won't help test integrations (how your plugin behaves running inside a host program) and is more geared towards testing your plugin's independent parts?
For those worried about stdout capture, I've just demonstrated how this proposal can allow for @CAD97's style of multiprocess stdout capturing: https://github.com/djrenren/rust-test-frameworks/blob/master/multiprocess_runner/src/lib.rs
This code compiles and runs under my PR: https://github.com/rust-lang/rust/pull/53410
The output of cargo test
is available in the same directory and is called output.txt
And just for clarity, the example does spawn a process for each test, which is suboptimal on OSes where process creation is slow, e.g. Windows. A real runner would implement a scheme like I described earlier to avoid spawning more processes than necessary. (And actually parallelize the running of tests.)
Whoops, I'm (very) late to the party.
I have read the blog post but not looked at the rustc implementation and I'm concerned with the presence of impl Termination
and String
(see Testable.name
) in the proposal.
How will this proposal work with the version of #![no_std]
binaries that's being stabilized for the edition release? Those #![no_std]
binaries use #![no_main]
to avoid the unstable start
(i.e. fn main()
interface) and termination
(i.e. Termination
trait) lang items. Without start
if rustc --test
injects a main
function into the crate it will end up being discarded as these binaries use a custom entry point. Without termination
the Termination
trait doesn't exist (it is defined in std
after all).
Also it would be best if using a custom test runner didn't require an allocator as that requires quite a bit of boilerplate (e.g. #[alloc_error_handler]
) and uses up precious Flash memory.
Of course I'm not going to block any PR since this is in eRFC state and the idea is to experiment; i just want to make sure that the embedded use case is not forgotten and that what we end up stabilizing does support it.
cc @Manishearth I thought cargo fuzz
used #![no_main]
and that it had similar needs as the embedded use case, but perhaps the fuzzed programs do link to std
so they are not that similar?.
@japaric : support for the embedded use case has been a goal for me from day one, though I will confess I'm not intimately familiar with it.
The Testable
trait in the proposal is defined by the testing framework, not the language, so it can definitely be designed without the need for allocation.
Test runner's definitely don't require dynamic allocation now because the slice that's passed is static.
As for the #![no_main]
problem I'm not very familiar with it so I'll have to review a bit, but I see the issue... Might have to revise a bit so the test runner can be called from a generated start
instead of main
.
Yeah, cargo-fuzz uses no_main but I thought we had dropped the use case from the original eRFC? (or had considered it low priority).
cargo-fuzz can be changed to be non-no_main with some libfuzzer edits.
Hey all,
As I'm wrapping up my internship and will have less time to work on this, I figured I'd leave some notes for the community on the state of things merged so far and going forward. (I'll still be around, but a full-time job will definitely slow my contributions).
Current Status:
#[test_case]
and #![test_runner]
are mergedWork to be done:
test::test_main_static
to accepting a trait instead of a struct so it can be compatible with other test data formats beyond TestDescAndFn
criterion
or wasm-bindgen
is probably a good place to start).Having presented this once at Mozilla and at a meetup, I've gotten a bit of feedback that's worth mentioning. There's concern about only having a single test runner for the crate. @SergioBenitez recommended having #[test_case]
be parameterized with what kind of test it's annotating, and building separate binaries for different kinds of tests.
I think this adds a reasonably heavy requirement on configuration and build-tools so I'm not sure I agree but the point is well-taken that cooperation between tests and runners could be tenuous under the current circumstances. I believe that having a default Testable
trait accessible in the compiler could act as defacto standard for compatibility.
I'm sure he could say more about it and offer a stronger argument, but either way this is something we should sort out. Luckily these things could be implemented on top of the current primitives so experimentation should be doable.
Happy to expand on that, @djrenren.
The current implementation makes it impossible, without workarounds or test runner intervention, to write tests that are intended to be run by different runners in a single crate. For example, consider a crate that would like to use both quickcheck
and criterion
:
#[quickcheck]
fn identity(a: i32) -> bool { ... }
#[criterion_bench]
fn benchmark(c: &mut Criterion) { ... }
Under the current plan, this would expand to:
#[test_case]
fn identity(a: i32) -> bool { ... }
#[test_case]
fn benchmark(c: &mut Criterion) { ... }
To run these under the current proposal, a user would need to provide a single runner that can execute both of these test cases. Of course, that's clearly not the intention here; one runner (quickcheck
) is responsible for identity
, while a different runner (criterion
) is responsible for benchmark
. Alas, there is no way to express this.
I see two potential solutions to this problem: one puts the onus on the test runner and cargo
while the other on the std. lib and cargo
. The first requires test runners to emit a cfg
when they emit a test_case
to ensure that annotated items are only compiled when that runner is requested. With this approach, criterion
and quickcheck
would be modified so that the original code would instead expand to something like:
#[test_case]
#[cfg(target_runner = quickcheck::run)]
fn identity(a: i32) -> bool { ... }
#[test_case]
#[cfg(target_runner = criterion::bench)]
fn benchmark(c: &mut Criterion) { ... }
When a runner foo::bar
is selected, cargo
would set the target_runner = foo::bar
flag automatically.
The second solution is almost exactly like the first but moves the responsibility for emitting the cfg
attribute from the test runner crate to the standard-library. Under this approach, the test_case
attribute takes a single parameter: the path to the runner intended to be used for the given test case. With this approach, criterion
and quickcheck
would expand the original code to something like:
#[test_case(quickcheck::run)]
fn identity(a: i32) -> bool { ... }
#[test_case(criterion::bench)]
fn benchmark(c: &mut Criterion) { ... }
Then, a test_case(runner)
is only compiled when runner
is the selected runner; how this happens is an implementation detail, but I presume it would work similar to the cfg
approach above.
I believe this second approach is the better approach because it is less error prone, more succinct, and hides implementation details. With the first approach, all crates must follow the appropriate convention or risk breaking consumer crate: if even a single crate does not cfg
a test_case
, it again becomes impossible to write test cases for multiple runners in a single crate. What's more, the first approach results in adding boilerplate to every test runner crate and requires cargo
to recognize a new cfg
that must be stabilized.
I think this adds a reasonably heavy requirement on configuration and build-tools...
I disagree. If cargo
is going to be aware of test runners anyway, having it emit a cfg
that recognizes which runner was selected is a trivial addition.
Why is the user restricted to selecting a single test runner? It would be nice to compile one binary that executes test cases from multiple runners.
I like the idea of #[test_case(quickcheck::run)]
. Just to strawman out an idea that would allow for running multiple test runners in one go:
fn main() { // entry generated by `cargo test`
let libtest_tests = &[ /* collected tests */ ];
let quickcheck_tests = &[ /* collected tests */ ];
let criterion_tests = &[ /* collected tests */ ];
::test::run_tests(libtest_tests);
::quickcheck::run(quickcheck_tests);
::criterion::bench(criterion_tests);
}
cargo-test
would allow running specific tests filtering as normal, but also allow filtering by runner if you, say, only wanted to run the criterion
benchmarks, or wanted to exclude them from your normal test. (For that matter, maybe the same treatment should be given to cargo-bench
, add a #[bench_case]
attribute, so that benchmark crates don't have to pretend to be test runners.)
@SergioBenitez - My rebuttal:
I think your view of test runners is different from what I envisioned and that's causing some confusion but firstly, I'd like to point out that the multiple test runner case works under the current implementation and looks roughly like so:
#![cfg_attr(quickcheck, test_runner(quickcheck::runner))]
#![cfg_attr(criterion, test_runner(criterion::runner))]
#[cfg(quickcheck)]
mod quickcheck_tests {
#[quickcheck]
fn identity(a: i32) -> bool { ... }
}
#[cfg(criterion)]
mod criterion_benches {
#[criterion_bench]
fn benchmark(c: &mut Criterion) { ... }
}
This misrepresents the relationship between a test and its runner though. In general, tests should know how to run themselves. In fact, this is how #[test]
works today.
#[test]
fn foo(){}
expands to:
fn foo(){}
#[test_case]
const foo_gensym: TestDescAndFn = TestDescAndFn {
desc: TestDesc {
name: TestName::Static("foo"),
ignore: false,
should_panic: ShouldPanic::No
},
testfn: TestFn::StaticTestFn(|| {
assert_test_result(foo())
}),
}
Where testfn
captures how to run the test. It enforces the interface that tests panic if they return a non-zero termination.
Test runners are concerned with structural information about the test. Anything needed for reporting, filtering, or dispatching. This is why quickcheck can reduce to a #[test]
function with a simple () -> ()
interface (and transitively into a TestDescAndFn
.
It's reasonable for a minimal test interface to look like:
trait Testable {
// panic on failure
fn run(&self);
fn name(&self) -> String;
fn is_bench(&self) -> bool;
}
Then your example:
#[quickcheck]
fn identity(a: i32) -> bool { ... }
#[criterion_bench]
fn benchmark(c: &mut Criterion) { ... }
Would expand to:
fn identity(a: i32) -> bool { ... }
#[test_case]
const identity_test: SimpleTest = SimpleTest {
test_fn: || { quickcheck_run(identity) },
test_name: "identity",
is_bench: false
}
fn benchmark(c: &mut Criterion) { ... }
#[test_case]
const benchmark_test: CriterionTest = CriterionTest {
bench_fn: benchmark,
test_name: "benchmark"
}
And there would be respective impls for the rest runner:
impl Testable for SimpleTest {
fn run(&self) {
self.test_fn()
}
fn name(&self) -> String {
self.name.to_string()
}
fn is_bench(&self) -> bool {
self.is_bench
}
}
impl Testable for CriterionTest {
fn run(&self) {
self.bench_fn(criterion::global_instance())
}
fn name(&self) -> String {
self.name.to_string()
}
fn is_bench() -> bool {
true
}
Explictly: there should not be a 1:1 relationship between test runners and test declaration styles
This allows the test runner to manage the execution of tests that were not created with it in mind.
Under the proposed addition of #[test_case(some_runner)]
then the test declaration-style author (for example, quickcheck) would make the call of the test runner.
Why is the user restricted to selecting a single test runner? It would be nice to compile one binary that executes test cases from multiple runners.
A test runner is essentially a main function. It doesn't make sense to have two for the same reasons that it doesn't make sense to have two main
functions. Because a test runner defines the command line interface for a test binary, having multiple would result in contention on this interfaces.
For example, @CAD97's suggestion is actually what I implemented at first. The issue here is that if ::test::run_tests
and ::quickcheck::run
allow for filtering by test name. What happens when I run:
$ ./test_bin mod_a::mod_b::test_name
You'd need a more complex mediator that did something like:
$ ./test_bin quickcheck mod_a::mod_b::test_name
If we allow multiple test runner then this mediation would have to be provided by the compiler, which puts us back into the realm of being opinionated about how test binaries should work. Such a setup could be implemented in user-space with the current implementation.
In general, tests should know how to run themselves. In fact, this is how #[test] works today.
I'm not sure what you could mean by "tests should know how to run themselves". A #[test]
clearly doesn't know how to run itself: it requires the testing library to parse things like should_panic
, to execute the function, and to catch panics. A #[quickcheck]
test also clearly doesn't know how to run itself: it requires quickcheck
to run the function against a myriad of inputs and then prune to find a minimal failure, which again, are foreign to the test itself.
Explictly: there should not be a 1:1 relationship between test runners and test declaration styles
Sure, but I don't see how my proposal enjoins this.
Under the proposed addition of #[test_case(some_runner)] then the test declaration-style author (for example, quickcheck) would make the call of the test runner.
I think this is the desire in almost every case. I understand the principal behind this plug-and-play, swappable runner, trait-based testing, but I don't see it getting use in practice. To think about what such use would mean: one crate would propose a testing interface, and some other crate would say "Oh, sure, I can run that!". It won't happen serendipitously, since the interface will become part of the public API, so one crate is making a decision to allow other runners to run "its" tests. If that's the case, then that crate can simply take in a runner in its own attribute:
#[quickcheck(some::runner)]
A test runner is essentially a main function. It doesn't make sense to have two for the same reasons that it doesn't make sense to have two main functions.
This is only true because you're defining it that way.
Because a test runner defines the command line interface for a test binary, having multiple would result in contention on this interfaces.
I think this is an interesting design point: should a test runner be able to define the CLI for the binary? I can see arguments both ways. If not, then this is a non-issue, but if so, I wouldn't be opposed to passing test-runner specific arguments, as you suggest, with something like:
$ ./test_bin --quickcheck quickcheck args here --criterion some other args
This would mean that either the binary can't accept any --
args, or the top-level CLI parser is clever enough to group things appropriately.
Overall, I think your design optimizes for the wrong thing while making a common thing very difficult and tedious to accomplish. I don't see a compelling argument for swappable runners, and making it cumbersome to have tests for multiple runners in a single crate, a situation I postulate will be very common (something my code does now, for instance), is unfortunate.
I'd also like to push back a little bit here @djrenren: it's not clear to me that a "clean" division of "test runners are just main functions that deal with CLI + output" and "tests that know how to run themselves" is sufficient. It could be, but it'd be nice to see some evidence for it (similarly, @SergioBenitez, can you think of any examples where that separation is not sufficient?). There are definitely cases where a test runner should have more knowledge about the exact tests, such as "run at most this many tests in parallel within this test suite", or "this suite of tests share the following set-up, which should only be run once". It could be that these can all be dealt with using shared state among the generated tests + mutexes, but some discussion around it would be good :)
tests for multiple runners in a single crate, a situation I postulate will be very common
Could you share some other places where this is common?
For example, my experience has not included any language where a project has multiple test runners in a single "place". The closest I can think of is multi-language projects. For example, the Rust Playground has Rust unit tests, JS unit tests, and Ruby integration tests, each with their own test runners. Thus I postulate it is not common 😇
Even in a Rust-only project, I think I'd prefer to place benchmarks in a completely separate file (a.k.a. a separate crate) from unit tests. I never want to run tests and benchmarks together and the information reported for each is drastically different.
I understand the principal behind this plug-and-play, swappable runner, trait-based testing, but I don't see it getting use in practice
I agree that it seems unlikely that there will be One True Trait for defining a test. I'd expect there to be multiple test runners, each with their own highly-specific trait.
What I would expect is to see adapters for tools like proptest to connect to popular test runners, leveraging each runners strengths. Thus, the crate proptest-stainless
would provide the macro support needed to define and connect a proptest-based test with the stainless test runner. Ideally, these macros would have high overlap with other adapters like proptest-builtin
and proptest-some-new-cool-runner
, but they would also allow tying into runner-specific functionality (e.g. the test output can show the number of runs or the failing inputs in a better way).
It's possible that over time some basic commonality can be discovered (perhaps something along the lines of TAP?), but there will always be something special a test runner will want to provide.
@jonhoo I tried to give a couple of example indicative of this in my previous comment, but perhaps we have different notions of what "separation" means in this case. Could you perhaps expand on the question?
@shepmaster In Rocket, I'd love to have the following in a single crate:
#[test]
testsI'm totally okay with benchmarks being in another crate. The first four are currently regular #[test]
tests, since custom test frameworks don't exist, but I can see a nice interface for all four given this new functionality. It'd be odd to split them up into four (or five, if #[test]
uses #[test_case]
) crates, and having to manually specify a cfg
to run the particular tests means not only more code, but also that with high probability, users won't run them before they submit a PR.
I agree that it seems unlikely that there will be One True Trait for defining a test. I'd expect there to be multiple test runners, each with their own highly-specific trait.
Note that my proposal continues to make this possible, albeit by making it explicit in the code.
Apologies if this is the wrong place for this, but it seems to be the furthest along the road I've traveled in search of skipping tests by detecting support for what is being tested at runtime. Basically something like #[ignore(lacks_support)]
where fn lacks_support() -> bool
(or maybe -> Result<(), String>
to give a reason support is missing) is a function within the relevant scope. My use case is that I'm testing features tightly tied to the kernel that is being run (this is for the keyutils crate) and I'd like to have the tests indicate that they are not actually giving useful information rather than just bailing out and saying "yep, totally works".
This was mentioned in https://internals.rust-lang.org/t/pre-rfc-make-test-and-bench-more-flexible/1328, but that discussion has died down.
So, is this something that is basically under this RFC impl (as something like an additional method on the trait for whether the test should be ignored or not) or is it something that should go into a new RFC? Thanks.
@mathstuf I think Is there a simple way to conditionally enable or ignore entire test suites in Rust? and Is it possible to conditionally enable an attribute like derive
? cover your needs if it's statically determinable, like a feature flag.
You can run a build script to test your system and set those feature flags, as well.
Using a build.rs
to determine it is probably sufficient for now (though the Rust bindings for performing these checks is what is being built…so that seems awkward…unless tests can have their own build.rs
?), but I think other things also fit under this umbrella. This also has precedent in other test frameworks (e.g., pytest has support for skipping based on runtime checks).
Actually, build.rs
would completely break for cross-compiling because what matters is the target machine's feature set, not the host machine's, so runtime-determined skipping still seems very relevant to me in the wider ecosystem.
Thank you very much, now I can't compile my project using Rust 1.38.0-beta.1:
"use of unstable library feature 'test': bench
is a part of custom test frameworks which are unstable"
And it's not even inside my code (library I use have this error), so I can't change it.
In general, tests should know how to run themselves. In fact, this is how #[test] works today.
I'll give a counter-example. In https://crates.io/crates/datatest, each "test" is a template for tests which is "rendered" by the test runner, either based on scanning the filesystem for files matching the pattern or by scanning given YAML file for test cases.
So, there are no "tests" unless test runner renders them based on external data.
I could instead run all the test cases in a single "#[test]", but large part of datatest
UX is that each individual test case is treated as a separate test (you can run it individually, for example, the same way you can run individual test with cargo test -- <test>
).
It works well with nightly features #[test_case]
and #[test_runner(datatest::runner)]
(though, I have to explicitly support "regular" #[test]
-tests). Stable is a different story... 😅
@idubrov, just discussed this in the dev tools meeting. As far as we could tell there was not much progress on this over the last months. We do want this to pick up speed again, though! If you'd like to help out from the rustc side too, let us know! It's always good to have actual use cases to work towards :)
@gnzlbg, it seems you worked on libtest/extracting libtest? Can you maybe give a quick summary of the state of this?
I'm happy to help, but I'm a bit confused about the scope of the problem. I've read RFC, I have some basic understanding of other pieces, however, the big picture is something I'm still not clear about.
To me, the whole testing framework seems like a combination of the following aspects:
Test runners. Things like scheduling threads, capturing input and output, collecting statistics (counts of failed tests, and so on). Seems like requirements could very vary across different test runners:
API to tests execution. This would be cargo
-based command-line API as well as IDE discovering tests (which, perhaps, should work via cargo
anyways). Given the previous item, it seems like different tests should have different entry points. However, at the same time, if custom test kinds are allowed and, essentially, unlimited, there still should be a single entry point to the whole bag of test runners somehow? Otherwise, how would you run them? Create cargo-xyz-test
for each test framework?
Another reason to allow everything in one crate would be so you don't have to compile same binary for each test runner. On the other hand, might be necessary for some types of tests (like, benchmarking tests requiring "release" build)?
Declaring tests. In my opinion, ideal scenario would be if you can mix and match tests of all various kinds in the same crate (in other words, you should not have to select only one single test runner for your crate).
My personal preference would be something like https://docs.rs/linkme/0.1.3/linkme/ or https://internals.rust-lang.org/t/idea-global-static-variables-extendable-at-compile-time/9879. I think, it would be a very nice generalization of what compiler currently achieves with #[test_case]
.
But then again, I don't know if that would work, for example, for fuzzing tests? They might want to do compilation their own way?
Rendering test "instances" based on their static declaration in code. This is to take test declarations from 3 and generate input for test runners in 1.
I make this very artificial separation between test "declaration" (what you see in code) and actual test instance (what you see in the output of cargo test -- --list
). However, this separation would server "datatest" the best. Currently I override standard test runner and "render" tests before delegating to it, but in fact, I don't really want to intercept test runner. Even more, I would love to be able to support other test runners (say, criterion) by "expanding" to their test declarations from my test "patterns".
I guess, for "datatest" in particular, this might be generalized as simple as saying that these test declarations should be lists of functions that return test declarations rather than list of declarations themselves (and then the format of these declarations, like https://doc.rust-lang.org/test/struct.TestDescAndFn.html, should be stable public API).
Hey @idubrov, I can answer a lot of these questions.
Every executable needs a single entrypoint, so when we build a test executable there needs to be some main
and that main
needs access to all the tests it's going to run. This inherently couples three problems: What is a test? How do I run it? How do I report it?
The test runner exists to answer both questions. For libtest
a test is a TestDescAndFn
, it runs them by invoking the wrapped function, and it reports it using its built-in reporting infrastructure.
This is okay but as you've noted it leaves out a lot of possibilities, for example, this is a very flat structure, if you want test suites with setup and teardown doing it in this model is awkward. Or if you want to test without enable panic unwinding you're out of luck (this is actually still a problem with the current test_runner
setup but it should be fixable).
So the test runner has to answer these questions, but it can delegate them to the test through the use of some trait. Imagine it had trait like MyTestTrait
with the method report()
. Then this test runner becomes flexible.
In order to allow CTF's to work in all environments some of the logic should be in rustc
. You raise a valid point that building for test
shouldn't imply dev
.
Right now, rustc
can accept a test_runner
parameter the same way it receives any other crate attribute. By making rustc
flexible enough, we can allow Cargo to orchestrate these options and shape convention. For this reason we left the cargo
command-line interface mostly out of scope. But note, that test_runner
can be used even in the Cargo integration test files.
As stated above, there needs to a single entrypoint to the executable so "multiple test runners" essentially means "one super generic test runner" which you can build today. If there ends up being a rightâ„¢ way for such a thing to work, we could consider stabilizing it. For now, the tools to build it exist.
The other option is compiling different executables for each test runner. This is also doable today using #[cfg(..)]
so if you put #[cfg(quickcheck)]
over each test that's meant to run with quickcheck and you put #![cfg_attr(quickcheck, test_runner(quickcheck::runner)]
in your lib.rs
then you're off to the races. The former can be hidden macros and the latter could eventually be orchestrated by Cargo profiles.
Solutions like linkme
struggle from not being particularly cross-platform, whereas test_case
works by manipulating the rust AST ensuring that its aggregation works everywhere rustc
works.
When I was designing the CTF approach, people were rightfully concerned about creating a truly generic aggregation technique and opening pandora's box to all sorts of weird shenanigans in the ecosystem. Also the affects of aggregation on incremental compilation were (and probably still are?) not well understood. #[test_case]
works well because it could eventually be reimplemented in terms of such a primitive (should it ever come to exist) in much the same way #[test]
is now built with #[test_case]
All that cargo test -- --list
does is build the test executables and pass in --test
to them. This means you could build any similar behavior into a test runner today. Case and point: libtest has it. It could be implemented for a more generic interface than TestDescAndFn
but ultimately that's all there is to it.
P.S. While I've covered a lot of it here, my user-centric writeup of the current implementation is here: https://blog.jrenner.net/rust/testing/2018/08/06/custom-test-framework-prop.html
@djrenren how this proposal relates to the whole RFC? Is it one possible direction?
I still can't make proper connections in my head; your proposal seems to be at least partially implemented in nightly Rust (and custom test runner is what I use in my "datatest" on nightly).
However, RFC itself is very confusing.
Ah yeah, the RFC is very wishy-washy (by nature of being an eRFC). It basically says something along the lines of "find use cases, make them work".
The test_runner
approach is fully implemented in nightly rustc (though, there's some weirdness around no_std
to work out). The current approach in nightly is one possible direction, but developed with a good deal of input from the tools team. From a design approach it represents a "maximally generic, minimal surface area" approach. In my opinion, it represents the best path forward, but at the very least, it provides a really powerful base on which to experiment with different APIs if you have different ideas.
The RFC really struggles from a lack scoping. For example, there will always be some kinds of testing we don't support, but the RFC doesn't draw this line. For my approach, I ended up ruling out doctests and compiletests, because these rely on concepts above the language the level. (Though a a variation of each could be expressed)
The purpose of the current implementation was to provide a grounding for a specific RFC (which I arguably should have written, but then grad school caught up with me). Through implementation we were able to learn a lot about what's possible, what's advisable, etc. that wasn't known when the eRFC was written. So that's the state of things.
Does that make sense?
Ah yeah, the RFC is very wishy-washy (by nature of being an eRFC). It basically says something along the lines of "find use cases, make them work".
Does my above comment count as something which counts as a "found use case" that can be made to work in an official RFC? Would it help if support for such a thing were added to your experimental branch?
So you just don't care that you brought breaking change into language? Don't break your own promise, call it Rust 2.0 then.
@mathstuf, for sure! Here's a sketch of how you'd solve your problem using the currently tools available in nightly:
#[test_case] const ....
value that generates values of this type.Right now this is a little painful if you're migrating from existing libtest stuff. Ideally you'd just declare a new trait to describe your SkippableTest
type and implement it for TestDescAndFn
and then you'd have backwards compatibility with regular #[test]
functions. Unfortunately, we can't do this because TestDescAndFn
isn't stable. This is one of the reasons @gnzlbg's work on extracting libtest is so important. If libtest
is just a regular crate you can add to your dependencies then you can rely TestDescAndFn
from that crate which has far-weaker stability requirements than anything that ships with rustc.
@e-oz please keep discussion on-topic and civil. Nothing being discussed here involves breaking changes as far as I can see. If there are breaking changes, we appreciate them being pointed out as we do not wish to break backwards compatibility.
@e-oz it sounds like your issue is the result of a library you consume taking advantage of unstable rust features, which is not allowed outside of nightly. This is a long-standing policy of the compiler to have nightly-only features to allow for experimentation with APIs before they are stabilized.
If #[bench]
was working on stable or beta rust then it was a bug, and it got fixed. If this happened because you moved from nightly to beta then it is expected behavior. You can solve your problem by going back to nightly, though in general I don't recommend depending on unstable nightly features as they are subject sudden change.
Does my above comment count as something which counts as a "found use case" that can be made to work in an official RFC? Would it help if support for such a thing were added to your experimental branch?
Yes, this is something that is already supported with nightly features / @djrenren proposal. The whole "datatest" thing I'm doing is basically generating test instances at runtime. Ignoring at runtime works, too (here I ignore by path, but as well I can ignore by anything else): https://github.com/commure/datatest/blob/114ea40c983ae108f6d7b70e6bb6fc5f3de56e0f/tests/tests/mod.rs#L74-L86
One thing, though, is that this requires overriding test runner. I wonder if lot of use-cases could be covered by allowing list of tests to be fn() -> &[TestDescriptor]
instead of TestDescriptor
. Or any other way (compatible with no_std
) of allowing to enumerate test descriptions at runtime rather than providing them directly?
OK. If we have docs for doing that, that sounds fine to me. I really just want the test report to explicitly say "skipped because $reasons" rather than:
Right now, my solution is to have a "sentinel" test whose output is basically "this module's tests are going to fail because the features aren't supported".
Thanks; I'll take a look at nightly when I get a chance to see what I can do there.
Oh and just in case you aren't aware, if you can determine that you want to ignore the test at compile time, you can use the ignore
attribute either with attr_if or your own macro to conditionally insert it.
On Aug 21, 2019, at 11:45, Ben Boeckel notifications@github.com wrote:
OK. If we have docs for doing that, that sounds fine to me. I really just want the test report to explicitly say "skipped because $reasons" rather than:
not showing up at all (compile-time decisions) spurious passing (because nothing was actually tested) spurious failure (really annoying) Right now, my solution is to have a "sentinel" test whose output is basically "this module's tests are going to fail because the features aren't supported".
Thanks; I'll take a look at nightly when I get a chance to see what I can do there.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
I'm aware; it's just that for cross-compiling, compile-time detection is kind of useless for detecting kernel features available :) .
This is a tracking issue for the eRFC "Custom test frameworks" (rust-lang/rfcs#2318).
Documentation:
Steps:
Unresolved questions:
Notes:
#[bench]
: https://github.com/rust-lang/rust/issues/63798#issuecomment-526593504Implementation history: