Closed dkuehr closed 2 years ago
Oh, that's interesting. I shouldn't have been accessing the first element without checks. On the other hand, I am very curious to find out why that vector was empty, as it breaks some of my assumptions. Is there a way for me to reproduce it?
In this code, there are two loops.
single_counters
, they are those set up by rustc with the -Z instrument-coverage
option). expression_counters
). Those counters are given by expressions over the single_counters
.It is much clearer with an example. With the following function:
fn foo() {
// c0
if x() {
// c1
} else {
// c2
}
}
We have a total of 3 instrumented code regions: c0
, c1
, and c2
. The Rust compiler creates an array of u64
somewhere in memory to store the coverage hits for each code region. But this array only contains two elements:
[c0, c1]
And the instrumented function foo
is compiled to something equivalent to:
fn foo() {
array[c0] += 1;
if x() {
array[c1] += 1;
} else {
}
}
And somewhere in the executable file, it stores a message describing how to obtain the hit count for the c2
code region, which essentially says:
c2 = c0 - c1
When the CodeCoverageSensor
is created, it looks for the “physical” array of counters (i.e. c0
and c1
) as well as the message in the executable that gives the “derived” counters (i.e. c2
). The former one is stored in coverage.single_counters
and the latter one is given by coverage.expression_counters
.
Anyway, what seems to be happening here is that there is an instrumented function which has no physical counters at all, which surprises me a little bit. It might be that there is a bug in the construction of CodeCoverageSensor
, or it may be that this is actually perfectly normal.
But if it is indeed normal, then I think it would make more sense to filter out the function entirely during the construction of CodeCoverageSensor
, so that the fuzzcheck never needs to iterate over it at all.
Thank you for the detailed explanation :)
The issue was triggered while fuzzing functionality from TezEdge, but I wasn't able to produce a standalone PoC program that reproduces the issue.
The patch I provided was enough to avoid the issue, but as you mention it isn't the best solution.
That's ok, thanks for identifying the issue! I'll release an update soon with a fix, whether I understand the root cause or not :)
Would you mind adding a panic statement right before the out-of-bounds access:
let mut index = 0;
for coverage in coverage {
if coverage.single_counters.is_empty() {
panic!(
"{} {} {:?} {:?}",
index,
coverage.expression_counters.len(),
coverage.function_record.name_function,
coverage.function_record.filenames,
)
}
// ...
to see if it panics, and tell me what is printed?
Here are the results:
thread 'main' panicked at '19683 1 "_RNCINvNtNtCs5i7zDO3ioyK_14tezos_messages3p2p14binary_message28all_consuming_complete_inputNtNtNtB6_8encoding10connection17ConnectionMessageNvYB1q_NtNtCs33WiA1oqOVT_14tezos_encoding3nom9NomReader8nom_readE0B8_" ["tezos/messages/src/p2p/binary_message.rs"]', /home/dan/.cargo/git/checkouts/fuzzcheck-rs-3f36d9a671138ac4/7abcc13/fuzzcheck/src/code_coverage_sensor/mod.rs:113:21
It seems there is at least 1 expression counter. That code is derived by procedural macros, but if you really need it I could dig out what is the final code generated.
This is an interesting issue because macro-generated code has been problematic with lcov
coverage reports, and for debugging itself. Basically at "symbol" level everything gets hidden behind macros instead of showing the expanded code. Furthermore attempting something like cargo-expand
and rebuilding the expanded code is not possible: https://github.com/dtolnay/cargo-expand#disclaimer
Thanks a lot! I had a quick look and It seems to be a problem caused by debug_assert!
. This function also has the same problem:
fn foo(x: &Option<bool>) {
x.map(|x| {
debug_assert!(x, "x is true");
x
});
}
The optimised MIR output is:
fn foo::{closure#0}(_1: [closure@src/lib.rs:5:11: 8:6], _2: bool) -> bool {
debug x => _2; // in scope 0 at src/lib.rs:5:12: 5:13
let mut _0: bool; // return place in scope 0 at src/lib.rs:5:15: 5:15
bb0: {
Coverage::Expression(4294967294) = 3 - 1; // scope 0 at /rustc/6d246f0c8d3063fea86abbb65a824362709541ba/library/core/src/macros/mod.rs:215:26: 215:56
Coverage::Counter(2); // scope 0 at /rustc/6d246f0c8d3063fea86abbb65a824362709541ba/library/core/src/macros/mod.rs:215:23: 215:87
Coverage::Expression(4294967295) = 1 + 2 for src/lib.rs:7:9 - 8:6; // scope 0 at src/lib.rs:8:6: 8:6
_0 = _2; // scope 0 at src/lib.rs:7:9: 7:10
Coverage::Unreachable for src/lib.rs:6:23 - 6:24; // scope 0 at /rustc/6d246f0c8d3063fea86abbb65a824362709541ba/library/core/src/macros/mod.rs:215:59: 215:84
return; // scope 0 at src/lib.rs:8:6: 8:6
}
There are two things that catch my attention:
Coverage::Expression(4294967294)
is an expression of counter 3
and 1
, which don't seem to exist in this MIR. Coverage::Unreachable
into consideration, and I don't know how it translates to LLVM's coverage mapping, will need to look into it.So it is either a bug in rustc, or a bug in fuzzcheck, or both. I also think it may cause other silent bugs and possible out-of-bounds accesses.
I will post updates here when I get to it, thank you! :)
Great job thanks!
Could it be that other counters are removed by a DCE pass? If my interpretation is correct, expressions reference counters just their u32
identifier but can the compiler actually know a counter is referenced by an expression just by that value to avoid elimination?
Yes, I think that's what's happening! I had a look at the code again yesterday. When a counter is unreachable. its “kind” changes to Zero
.
But Fuzzcheck already handles that, as all counters of kind Zero
are just ignored. What I didn't foresee are some odd interactions between physical, expression, and Zero
counters. Going back to the same function:
fn foo() {
// c0
if x() {
// c1
} else {
// c2
}
}
Imagine that, for some reason, c0
and c1
become unreachable. Then that could be encoded in the executable as:
physical_counters = [0, 0]
counters_with_code_regions = [
{ kind: Zero, region: "..." },
{ kind: Zero, region: "..."},
{ kind: Expression { op: Subtract, idx: 0 }, region: "..." }
]
expressions = [
{ lhs = physical_counters[0], rhs = physical_counters[1] }
]
The problem is that fuzzcheck gets its information about what counters matter from the counters_with_code_regions
array. When it reads that, it sees that there are no physical counters to observe. Which is why the single_counters
field was empty.
But I am a bit surprised that the physical_counters
array is not empty, and that the Expression
still exists. It seems to me that the compiler should have replaced it with Zero
as well, since it only references other Zero
s. As far as I can tell, there is no way for me to deduce that reliably, because nowhere in this message does it say that physical_counters[0]
and physical_counters[1]
are always 0 (no index is attached to Zero
counters). It is not a bug in the compiler per se, but it is slightly annoying.
But anyway, it is pretty easy to fix, since fuzzcheck doesn't do anything fundamentally wrong. It's just an edge case to consider. I'll try and release an update by next week!
And sorry for all the unnecessary details, I am using these issues and pull requests to flesh out my thoughts sometimes :)
Side note:
Since you also mentioned that you had problems with procedural macros, I dug into that and found that fuzzcheck does indeed ignore a lot of code inside macros. However, it appears to be fixable.
While there is no information that links counters to specific code regions, the counters exist and can be used for fuzzing.
If I stop looking at the
counters_with_code_regions
array to know which counters matter and instead consider all physical counters and all (sub)expressions, then macros can be instrumented. This comes, however, at the cost of an explosion in the number of observed counters. So I have been working on a solution to minimise the set of observed counters without losing important feedback. In my tests, that can reduce the number of counters inpulldown-cmark
from ~2700 to ~1500, which is great for performance. I am hoping to ship this at the same time as this bug fix :-)
I realised I forgot to confirm that the bug has been fixed. Thanks for reporting it in such a detailed way!
Unsafe access to the first vector element when the vector was empty was causing a segmentation fault when using fuzzcheck. This patch fixes it.