Closed gnzlbg closed 3 years ago
I know I'd love to see this work! Do you know what it would entail in terms of what the hoops a prototype would have to jump through? Also, does this work reliably on platforms like Windows? Or is it still best on Linux and "works mostly elsewhere"?
A prototype would need to generate the LLVM IR for the instrumentation, for that it needs to generate a mapping between source ranges and performance counters (I would start with just functions). Then it needs to generate and embed the IR in the object files. It would be a good idea to look how clang then outputs this into a file to use the same format (which is documented), and test that we can read it with llvm-cov in linux and macos (don't know about windows support but since clang is gaining full windows support if its not there it will be there soon).
I think that would be enough for a prototype, from there we can move on to generating instrumentation for conditionals (match/loop/jumps/ifs...) and blocks (how many iterations was a loop body executed). We could go down to expressions, but then it would be wise to offer users a way to control how much instrumentation is generated: functions, branches, and loops (which are branches) is typically enough. We should then support skipped regions (for conditional compilation), expansions (for macros, maybe plugins), and dealing with unreachable code (there is a performance counter that is always zero for that).
The meat of the work is in the source=> performance counter mapping, generating the IR, and generating the conforming output. llvm-cov
works at least on mac and linux, but IIRC clang has options to generate coverage information even for particular gcov versions. The source code for all this is available so taking a look wouldn't hurt.
Also, does this work reliably on platforms like Windows? Or is it still best on Linux and "works mostly elsewhere"?
It works on Mac and Linux, don't know about Windows. Even if it doesn't work everywhere, this is probably the way to make it work in as much platforms as possible anyways. Clang support on windows is pretty good already and it is only getting better.
@sujayakar's done some awesome work to prototype this on OSX at least, discovering:
-C passes=insert-gcov-profiling
-C link-args=-lclang_rt.profile_osx
, and that library xcrun --find clang
in a path like usr/lib/clang/7.3.0/lib/darwin/libclang_rt.profile_osx.a
..gcda
filexcrun --find clang
, there's an llvm-cov
tool, and that can be run over the output file to generate information.That should at least get code coverage working on OSX in some respect! THis is also a pretty strong case that it may not be too hard to actually get this working for all platforms if LLVM's standard tool suite "just works".
@alexcrichton That's gcov coverage, which is completely different from what @gnzlbg meant in this issue.
The correct pass name is instrprof
, implemented in lib/Transform/Instrumentation/InstrProfiling.cpp
. See also LDC LLVM profiling instrumentation.
@sanxiyn is correct.
I just want to add that I don't think we should implement clang-like gcov coverage generation.
LLVM format can be used for many more things (e.g. profile guided optimizations), and LLVM's llvm-cov tool can generate gcov files from it.
afaik, there's a PR open for this: https://github.com/rust-lang/rust/pull/38608
As far as I can tell, #38608 is still gcov coverage and not instrprof.
I believe this is now implemented as -Z profile
and tracked at https://github.com/rust-lang/rust/issues/42524, so closing in favor of that.
This is not implemented yet. LLVM coverage is different from gcov coverage, which is also different from sanitizer coverage. LLVM implements three separate coverage mechanisms.
Would LLVM coverage ( instrprof
) enable usable code coverage of generic heavy code? None of the existing code coverage solutions work well with heavy use of generics.
My project (a gameboy emulator) makes very heavy usage of generics and it'd be wonderful to get code coverage working on integration tests but I'm not really sure how.
I think good code coverage support is the single most important piece of tooling missing in Rust.
When I run cargo test
, and everything is green, I get a good feeling, but this feeling actually means nothing at all. To make cargo test
mean something, it would need to at least tell me how many code-paths of my application have been exercised.
This information does not mean much either (just because a code-path was exercised does not mean that it was exercised for all inputs), but it would mean more than nothing, and it would make writing tests and asserting the value of test way easier.
IMO adding this kind of capability to cargo test
(and yes, it should be part of cargo test, people should always know their code coverage) would be extremely valuable.
There is only another part of tooling infrastructure that would come close to this in value, and that would be an undefined behavior detector for all rust code.
This might sound like a rant, but I just want to raise awareness that this is a very important issue at least for me (and from the other issues being filled, for others as well) because it directly affects the correctness of rust programs (we don't want them to only be memory safe, but also to have correct logic).
Maybe if rustfmt
, clippy, and the RLS
advance enough during the impl period, the Rust tool team could prioritize these two tools next?
If I don't have precise auto-completion information or my source code is not perfectly formatted, well, I can live with that. But if my program has undefined behavior and I am not catching it because I am only testing 40% of my code-paths then... I am screwed. Does this make sense?
This is not implemented yet. LLVM coverage is different from gcov coverage, which is also different from sanitizer coverage. LLVM implements three separate coverage mechanisms.
Hi, can anyone explain that what's the difference between "LLVM coverage" (profile-instr-generate
) and "gcov" (insert-gcov-profiling
)? I don't know the detailed implementations of these tow passes, but will profile-instr-generate
be more accurate than insert-gcov-profiling
.
I found @kennytm's answer to my question. Let me put here in case someone like me gets lost in the overwhelmed references: https://github.com/rust-lang/rust/pull/44673#issuecomment-330435323
I've done a bit of prototyping on this to see what would be involved in using instrprof
for coverage, and it's revealed (to me) that the coverage aspect of this feature is secondary to the "Profile Guided Optimization" feature that instrprof
was originally written for.
So, here's a vague "plan of action" (with progress so far) to enable PGO and LLVM-assisted coverage:
cargo
This is currently implemented in its most basic form as -Zpgo-gen
which adds the -pgo-instr-gen
and instrprof
passes to LLVM to add instrumentation of all functions and basic blocks (loop bodies, if statements etc.).
There may be other places that it's useful to add instrumentation (e.g. adding vtable-downcast counts to enable -pgo-icall-prom
passes - which turn dynamic dispatch into static dispatch in fast paths).
:+1:
With the above work complete, the generated binaries will count (in-memory) each time an instrumented code path is hit. To extract this information, the profile-rt library should walk the various __llvm_pfr_X
sections of the binary/memory and dump the values out to a file (Clang dumps to default.profraw
). The format of this file is (as far as I can see) "documented" in LLVM/Clang's codebase. On the other hand, the current implementation of Rustc's profile-rt library is taken wholesale from LLVM and thus already supports this function.
:+1:
With a default.profraw
, one can create a profdata
file (using llvm-profdata merge
). This file can be passed to the pgo-instr-use
(and other) passes to enable optimisations based on gathered profile information. Again, this is supported in nightly Rustc today (through -Zpgo-use
).
:+1:
LLVM's coverage tooling can use a "Coverage Map" to map between instrumentation counters and blocks of code. The format of the map is well documented at http://llvm.org/docs/CoverageMappingFormat.html and there's code in LLVM (http://llvm.org/doxygen/classllvm_1_1coverage_1_1CoverageMappingWriter.html) for creating the magic format. Note that this may require Rustc manually inserting instrumentation (rather than using -pgo-instr-gen
) since it needs to know which counter corresponds to which code block. The code in clang (https://github.com/llvm-mirror/clang/blob/master/lib/CodeGen/CodeGenPGO.cpp) that does this is probably a good place to start.
Note that the LLVM coverage map format supports various useful features, not well handled in GCOV-style coverage:
❌
Creating a cargo coverage
(or, possibly better, just creating coverage from cargo test
) by adding profiling to the test
cargo-profile and running the resulting profile data through llvm's tooling would be really cool. Especially if the output format is compatible with tools like Coveralls.io and can produce useful graphical outputs for analysis afterwards.
Likely requires stabilising -Zpgo-gen
or at least exposing enough of it that cargo can create a profiled build.
:x:
There are some bits of Rust code for which coverage is non-obvious:
Foo<i32>
in my UTs but use Foo<String>
in my real code, have I really tested my code (in this case, String needs drop
so maybe there's a subtlety there)?
llvm-cov
knows how to handle this in the coverage map at the moment, but it might make sense for it to do so to support this for C++ templates too?#derive
s) need care too - the expanded code coverage is potentially uninteresting (if you trust the proc-macro
) or critical (if you don't).* How do we handle non-instantiated generics? If a generic is never instantiated then no code is created for it so there's no counters for it.
I think rust should behave the same as clang here. And if I remember correctly, then not instantiated templates are reported.
* Similarly, should separate instantiations of the same generic be covered separately? If I test `Foo<i32>` in my UTs but use `Foo<String>` in my real code, have I really tested my code (in this case, String needs `drop` so maybe there's a subtlety there)?
llvm-cov at least has the option to show coverage per instantiation (or merged - per -show-instantiations). So I think it makes sense to provide this as well.
Great write-up @bossmc! I'll take a stab at adding LLVM coverage map generation, unless it's already being worked on elsewhere?
Whilst I'm happy to try and figure out a solution on my own, a little direction from the compiler team would no doubt ensure that I arrive at something that's acceptable far sooner! Would anyone be willing to mentor this?
Cc @michaelwoerister
I don't have time to mentor this but feel free to ask specific questions!
@bossmc Are you sure that -Zpgo-gen
(now stabilised as -C profile-generate
) adds LLVM's -pgo-instr-gen
and -instrprof
passes? My understanding is that it "merely" instruments via gcov, which is the very issue under discussion here? Have I missed something?
-C profile-generate
is exactly the same as Clang's -fprofile-generate.
Thanks @michaelwoerister, that's what I thought—the difference being, AIUI, that -fprofile-generate
instruments the IR whereas -fprofile-instr-generate
(which this issue seeks) effectively instruments the AST (via source mapping).
Maybe debuginfo can help there?
Unfortunately, for the reasons @kennytm gave in the comment to which @mssun linked above, debuginfo is not sufficient.
@eggyal yes, profile-generate
enables two passes:
pgo-instr-gen
which heuristically adds http://llvm.org/docs/LangRef.html#llvm-instrprof-increment-intrinsic (and related intrinsics) to the LLVM-IR basically by adding them to the beginning of codeblocks.instrprof
which lowers the above intrinsics to "real code" I assume these are intrinsics so the implementation of profile counting can be smart (threadsafe in threaded programs, using native types where needed etc.)Together (with the profile-rt library) these allow generation and subsequent use of profiling data during compilation. Unfortunately together these two passes are insufficient for code-coverage usage, since the LLVM is too far away from the AST. Part 4 of my write-up discusses writing the coverage mapping code that maps AST nodes to instrumentation intrinsics - this is what then enables frontend-guided coverage.
I suspect that adding the coverage map will require disabling the pgo-instr-gen
pass and adding the intrinsics manually (this seems to be what clang calls profile-instr-generate
?), because there's rust-specific knowledge that can/should be applied (e.g. generic expansions, semanticly equivalent AST changes etc.) and because there's no way to guess the profiling entries that will be generated by pgo-instr-gen
.
https://github.com/rust-lang/rust/pull/48346/files#diff-2030f049f8df2454e3720ea9403bc14aR446 is where we started adding the pgo-instr-gen
pass.
I am very eager to see this feature work for Rust. I was the person who designed and originally implemented Clang's profile-instr-generate feature (for both PGO and code coverage). I'm currently working on a project that is implemented in Rust and I miss the ability to get good code coverage data. I'm not yet familiar with the internal workings of the Rust compiler, I'd be happy to consult with anyone who is working on it. I can start by confirming that to get the best results, the instrumentation should be added in the Rust frontend by walking the AST, prior to any internal lowering so that the AST corresponds closely to the source constructs. Adding instrumentation in an LLVM pass can work OK for PGO but it's not what you want for code coverage.
Thank you both @bossmc and @bob-wilson — all very helpful! I've spent some time over the past week getting to grips with how instrprof is implemented in Clang, and have begun prototyping some broad structures of a Rust implementation.
I had pretty much come to exactly the conclusion you mention, @bob-wilson: in order to closely align coverage with the source, instrumentation will need to be added before the AST is lowered (or at least, before HIR is lowered to MIR—this seems slightly better to me as macros will then have been expanded). However, in order to maintain backend-independence, this presumably means that new instrumentation intrinsics will need to be introduced into the MIR (and possibly the HIR)?
As an intermediate step before working on that goal, I am currently thinking that I will first instrument the MIR on lowering to LLVM IR—and thereby prove that I have both instrumentation and coverage map generation working at that level first. But perhaps that would be unnecessarily arduous and I would be best to start with the AST/HIR?
Otherwise I do think that I have a reasonable feel right now for my next steps, but am more than happy to work on this with anyone else who'd like to contribute! Also very grateful to those who've offered their input, especially since this is my first foray into the internals of rustc.
I think the instrumentation and coverage map should be generated from the ASTs after macro expansion and before lowering to HIR. Adding the instrumentation is relatively easier than generating the coverage mapping, so I wouldn't bother doing that on MIR first -- you would probably end up throwing most of it away. I do agree with you that it would be best to start with the instrumentation, and adding instrumentation intrinsics to HIR and MIR seems like the right thing to do.
Another thing that I've been pondering is exactly what should be excluded from the coverage map? Clang excludes system headers, but perhaps we should exclude all dependencies from outside the package/workspace being built?
Can anyone confirm the semantics of syntax::source_map::SourceFile::is_imported(&self) -> bool
?
I think (though I suspect @bob-wilson will know better and can feel free to correct me) that the coverage format has the capability to represent "macros" (remote code) sensibly (https://llvm.org/docs/CoverageMappingFormat.html#file-id) so maybe we'd want to apply instrumentation directly to the parse tree?
@bossmc: Aye, expansion regions are perfect for macros—and they can be fully generated from the Span info in the HIR, whereas the parse tree would not have the expanded code to reference (thus requiring subsequent modification of any found expansion regions to complete the referencing).
Ah yes, I'm being slow :) - so long as the expanded code knows it came from a macro then HIR is the correct point indeed.
Based on the RustC guide, I had the impression that HIR has a number of lowering changes that diverge from a 1-1 mapping to the source code. Trying to map back to the exact source locations for the original constructs seems like it may be difficult. The RustC guide also describes macro expansion as happening prior to lowering to HIR. Is that out of date? It would definitely be best to insert the instrumentation and build the coverage mapping at a point where macros have been expanded, but I was hoping that could be done before other lowering transformations.
Based on the RustC guide, I had the impression that HIR has a number of lowering changes that diverge from a 1-1 mapping to the source code. Trying to map back to the exact source locations for the original constructs seems like it may be difficult. The RustC guide also describes macro expansion as happening prior to lowering to HIR. Is that out of date? It would definitely be best to insert the instrumentation and build the coverage mapping at a point where macros have been expanded, but I was hoping that could be done before other lowering transformations.
Perhaps one of the seasoned rustc developers could chip in on this, as I'm not entirely certain—but after digging through the source for a bit it seems to me that MIR is perfect for instrumentation and source-mapping: I suspect that the coverage regions for a given function are exactly the MIR Statement
descendants of its Body
that are StatementKind::Assign
and their source locations can readily be obtained from their Span data (yielding either a code region or an expansion region); skipped regions might even be identifiable by Statement
descendants that are StatementKind::Nop
—although perhaps some such regions will not have any presence whatsoever in the MIR?... which then might only leave gap regions (that I confess I don't yet entirely understand).
which then might only leave gap regions (that I confess I don't yet entirely understand)
I understood it to be for empty statement, e.g.
Edit: see below.if (condition) {/*gap*/}
?
At least that's what I'm gathering from CoverageMapping.h:
/// A GapRegion is like a CodeRegion, but its count is only set as the /// line execution count when its the only region in the line.
Although you may very well be able to implement some kind of code coverage support at the MIR level, that would be a different approach than what we did in Clang. One key idea from Clang's implementation is that both the instrumentation and the coverage map are directly tied to the source constructs. As one example, take an "if" statement. The instrumentation is done by inserting a counter at the start of the "then" block. If there is an "else" block, there's no need to put a counter there because you can calculate the execution count from the parent count and the "then" block count. This also holds true in the coverage map, where the goal is to precisely map the execution counts back to source locations. So why should this be done before any lowering? As an example, take one thing that I saw in the RustC Guide: "if let" is lowered to HIR as a "match". A match statement can have any number of arms, so it generally needs to be instrumented differently than an "if" statement. When lowering from an "if let", presumably there is only one arm (or two, if there was an "else") but even if there is a way to recognize that, it would need to be a special case and I suspect it would get quite complicated. Could you just go ahead and instrument it as a "match"? Sure, but then you're likely to run into challenges mapping back to the exact source locations. I can give an example of a gap region that is somewhat relevant here. Say that you put a blank line in between then "if let" condition and the start of the "then" block. What execution count should be displayed for that blank line? Clang treats this as a gap region from the end of the condition to the start of the "then", and it chooses to use the "then" block count for that region. I haven't given it a lot of thought, but I think the answer may be different for a "match" statement. It would be strange for that blank line to show the execution count of the first arm of the match.
That's really helpful, thanks @bob-wilson.
I'm definitely not averse to injecting the instrumentation earlier (it was what I had earlier thought would be needed), but I'm still not completely convinced that MIR lacks the necessary information to produce the desired result. Taking the example you mention, of if
being lowered to a match
, the generated MIR TerminatorKind::SwitchInt
will have the relevant number of .values
/ .targets
—and a counter expression can be used in place of an outright counter if appropriate, irrespective of whether the original source was an if
or a match
. So far from being a special case, it actually enables match
expressions with fewer arms to be handled with counter expressions in exactly the same way as an if
. Nor do I think there's any issue mapping back to source locations, as the containing Terminator
structure has .source_info
that contains the generating Span.
Part of my comment with that example is that I suspect you might want different behavior for the gap region after the "if let" condition than for the gap region after a match condition with 1 arm, and you may not be able to distinguish them easily after lowering.
The other missing part of this discussion relates to using these profiles for PGO in the context where someone wants to save a profile as part of their project (e.g., in a case where collecting a good profile is labor intensive). We had a design goal that the profile should remain usable in that scenario even if you had updated your compiler version so that the internal lowering changed. We did that by tying the profile structure to the source code structure as reflected in the pre-lowering ASTs. I don't know if Rust will ever support that style of instrumentation-based PGO, but I think it is still a valuable feature for some styles of development and it would make sense to me to implement the coverage profiling in a way that would align with that in case someone wants to support it in the future.
More generally, I think you can probably follow Clang's lead in many aspects of this work. It's an approach that has been tried and works well in practice. If there's a compelling reason to do it differently in Rust, then it might make sense, but otherwise, my advice is to follow Clang's approach.
Is there even a need to leave out the counter on the else
block? Isn't that one of the reasons that Clangs coverage does not work reliably when exceptions are thrown?
@pJunger: It's not "leaving out" the counter, but rather providing means for deducing its value with zero runtime overhead.
@bob-wilson: I'm not sure I'm following the thinking re gap region differences following if let
and match
—would you be able to elaborate with an example? You're no doubt right about the PGO side of things (especially when rustc's internal structures are still quite fluid) and the principle of following Clang generally... I'll give it some more thought! My only inkling for starting from MIR is that it has essentially done a lot of the grunt already, and starting from AST would be duplicating much of that work.
Leaving out the "else" counter reduces the runtime overhead of the instrumentation and produces smaller profile files. There might be some 2nd order effects related to throwing exceptions, but the primary trouble with exceptions is that we thought it would be too expensive to insert a counter after every function call. The issue here is that throwing an exception can essentially cause a difference between the execution count going into a function call and the count coming out of it. The brute force approach to detect that would be to put a counter after every call. You might be able to do something clever with automatically generated exception handlers, but I'm not aware of anyone who has looked into that.
@eggyal : Consider "if let condition" followed by a blank line vs. "match condition" followed by a blank line. What execution count should be displayed for that blank line? For the "if", Clang shows the "then" count. The corresponding behavior for the "match" (or a match lowered from "if let") would be to show the count for the first arm of the match. That just seems wrong to me.
Okay, I understand... although wouldn't this be more a coverage tool issue rather than an instrumentation issue? Indeed, shouldn't the coverage tool show the execution count of any lines between if/match arms as being either skipped or that of the containing if { }
/ match { }
region itself—i.e. without any need for explicit gap regions?
This is definitely outside my area of expertise, but some observations:
if let
vs match
distinction is captured in the MatchSource field of each hir::ExprKind::Match
value.Yes, it's a coverage tool issue. There are a lot of cases like this where there is no clear right answer and the best compromises require understanding the semantics of the code. (I.E., you want the compiler to figure it out instead of relying on some pattern matching in an IDE editor.) The instrumentation part is relatively easier than the coverage mapping, and you definitely want the instrumentation and coverage mapping to be generated from the same ASTs, so be sure to think about the coverage tool issues when deciding where to insert instrumentation.
Regarding the gap regions, I'm not the expert on that part, but I remember that there was quite a lot of iteration trying to come up with solutions that most people were happy with. It's really not easy. For example, for blank lines between match arms, you don't want them to show up as being unexecuted, but using the count from the containing region can give very surprising results, e.g., if none of the match arms ever execute but the blank lines between them have large non-zero counts from the enclosing region. Most people seem to expect that those blank lines will be included as part of the code region right before them, but then you run into problems with gaps after control flow.
@rkruppe Clang does not generate coverage instrumentation from LLVM IR. That's way too low level to do a good job. For gcov-style instrumentation, we used to rely on debug info to get back to source locations, and that really doesn't work well in practice. Debug info tends to be optimized to work well for single stepping in a debugger, and it's not so good for precisely identifying code coverage regions.
@bob-wilson Obviously this is better for runtime performance & data sizes.
I'm just asking because in the automotive industry every decision & condition have to be covered (including those that throw/panic), so taking over the design from Clang 1:1 would make this instrumentation useless for that. (And this is a bit sad, because otherwise you could interpolate CD/Coverage from those counters and compete with Bullseye Coverage and the likes).
@bob-wilson
@rkruppe Clang does not generate coverage instrumentation from LLVM IR. That's way too low level to do a good job. For gcov-style instrumentation, we used to rely on debug info to get back to source locations, and that really doesn't work well in practice. Debug info tends to be optimized to work well for single stepping in a debugger, and it's not so good for precisely identifying code coverage regions.
Yes, I draw the analogy between generating instrumentation from MIR and generating it from LLVM IR to suggest that generating it from MIR won't be any better.
Regarding the gap regions, I'm not the expert on that part, but I remember that there was quite a lot of iteration trying to come up with solutions that most people were happy with. It's really not easy. For example, for blank lines between match arms, you don't want them to show up as being unexecuted, but using the count from the containing region can give very surprising results, e.g., if none of the match arms ever execute but the blank lines between them have large non-zero counts from the enclosing region. Most people seem to expect that those blank lines will be included as part of the code region right before them, but then you run into problems with gaps after control flow.
Not that I propose addressing it, but this does rather sound like a UI issue, since the execution counts relate to regions rather than lines but (presumably for historical reasons) the tools only display line counts... these problems seem to stem from that disparity? Anyway, I almost have both instrumentation and mapping generation working from MIR and hope from that to glean a much better understanding of these problems! Will then work on moving to the AST instead.
@bob-wilson: I'm delighted to say that I now recognise the folly of trying to generate these mappings from the MIR: it's not practical to comprehend program flow through the BasicBlocks well enough to establish the code regions. Nevertheless I'm pleased that I undertook the exercise to understand the problem better—thank you for patiently indulging me whilst I caught up with you!
By the way, maybe this goes without saying, but I definitely recommend following the Clang implementation as closely as makes sense. For the instrumentation code, see lib/CodeGen/CodeGenPGO.{h,cpp}. The coverage mapping code is in lib/CodeGen/CoverageMappingGen.{h,cpp}
I guess one difference will/should be that return
and ?
needs to be handled?
There are ways to more or less easily obtain code coverage information from rust binaries, some of which are in widespread use (e.g. travis-cargo + gcov + coveralls.io). However, these are either platform specific (gcov/kcov are linux only), or incomplete (coverage for documentation tests is not collected).
It would be better if rustc would be able to instrument rust binaries and tests using LLVM Coverage Instrumentation. This would allow code coverage information to work portably across the supported platforms, as well as produce reliable coverage information.
Ideally, such support would be integrated in an easy to use
cargo coverage
subcommand and the Rust Language Server, such that IDEs can use this information to guide the user while writing tests.