Open jonhoo opened 1 year ago
Looks like the incremental directory is really causing us problems because it includes lots of files for every output target (binaries, the crate itself, and integration tests): https://gist.github.com/jonhoo/ef66c75137b1f9a14c322faa747b8cc1
I also ran perf on grcov
after all the llvm-cov export
invocations have finished (but grcov
is still running), and it shows a significant portion of execution time being spent in calls to realpath
via std::fs::canonicalize
through grcov::path_rewriting::canonicalize_path
in grcov::add_results
(called from grcov::consumer
).
cc @jerel
@marco-c I'm happy to try to help in fixing this, but could use some pointers for how you think this should be handled. For example, would it be okay to just always ignore everything that's in /incremental/
?
Thanks for splitting this out and digging into it. What you've documented makes sense with what I've seen in practice. As supporting evidence my workaround solution completes quickly and (I think) does the same work but avoids enumeration of target/debug/**
:
TEST_BIN_PATHS=$( \
cargo test --no-run --message-format=json --tests --lib \
| grep "{\"" \
| jq -r "select(.profile.test == true) | .filenames[]" \
| grep -v dSYM - \
)
rust-profdata merge -sparse ./*.profraw -o combined.profdata
rust-cov export \
--ignore-filename-regex='/.cargo/registry|.cargo/git|/rustc|target/debug' \
--instr-profile=combined.profdata \
--object target/debug/libmy_app.so $TEST_BIN_PATHS \
--format lcov > lcov-combined.info
grcov lcov-combined.info -s . -o ./html -t html --ignore-not-existing --llvm
If I'm not mistaken all of the binaries that grcov needs to load are at the top level of target/debug/*
so maybe restricting the search to top level could work?
I think that kind of filtering would end up happening here so we don't even recurse down into that dir: https://github.com/mozilla/grcov/blob/917494ab5bcdfd675d435922f96ac08d7869aad0/src/producer.rs#L171-L196
The challenge in doing it in grcov
is that we don't know it's being pointed at a Rust build directory. I suppose we could look for /{debug,release}/incremental/
and exclude that, but it feels like we may instead want to add a command-line argument to grcov
to specifically exclude directories.
So something like an --exclude
flag that feeds into a filter_entry
call on the WalkDir
.
Ah, wait, no, that's for walking paths
, whereas what we need to filter is the stuff in --binary-path
... Hmm..
It's over here we'll need to make a change: https://github.com/mozilla/grcov/blob/917494ab5bcdfd675d435922f96ac08d7869aad0/src/llvm_tools.rs#L102-L130
If it's helpful to your exploration I had some uncommitted work from several weeks ago where I was experimenting with accepting binary-path
as a vec of binaries and then passing them in to export
as --object
args https://github.com/mozilla/grcov/compare/master...jerel:multi-binary-path
IIRC it didn't solve the perf issue by itself (in hindsight probably because it was still walking the target/debug/
somewhere else).
Oh, that's interesting. You should open that as a PR independently of this; I think it's a good change in isolation.
btw, the places you have &vec![...]
I think you can just do &[...]
instead.
I think we have a couple of paths forward:
/{debug,release}/incremental/
.--exclude-binaries
flag to "subtract" from the directory set chosen by --binary-path
..o
files.target/{debug,release}/deps
to --binary-path
instead of target/{debug,release}
.I don't know which of these paths is better, and we may need the maintainers to chime in on that.
I also think it'd be nice if grcov -v
logged every file it runs through
https://github.com/mozilla/grcov/blob/917494ab5bcdfd675d435922f96ac08d7869aad0/src/llvm_tools.rs#L122-L132
as that would highlight problems in this area much faster.
The ideal solution is probably fixing #535, basically kind of reproducing the way llvm-cov ignores files.
1. Special-case exclude `/{debug,release}/incremental/`.
I'm not a fan of special cases, they might silently break things for uncommon use cases.
2. Support a glob-based `--exclude-binaries` flag to "subtract" from the directory set chosen by `--binary-path`.
This would make things more complex for users of grcov. I'd rather we did the right thing by default.
3. Special-case exclude `.o` files.
Same as 1.
4. Only include files marked executable.
With this, we'd risk missing some coverage (e.g. from shared objects).
5. Recommend that users pass `target/{debug,release}/deps` to `--binary-path` instead of `target/{debug,release}`.
Might work, hopefully it doesn't break anything, but I'm not 100% sure.
Since 5 was easy to try out, I went that path (so no more .o
files), but it's still very slow (multiple minutes) to execute, so I think there's a deeper problem here. An strace -f -c
of the grcov
execution gave:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
78.58 1102.814225 19947 55286 2332 futex
11.71 164.292082 606 271057 sched_yield
7.83 109.948569 157 697683 263 newfstatat
1.08 15.155137 797638 19 wait4
0.20 2.862066 28 100751 29369 read
0.18 2.581477 29 86427 mprotect
0.14 2.001592 21 93083 write
And here's the perf report
:
- 59.39% 0.00% llvm-cov llvm-cov [.] (anonymous namespace)::CodeCoverageTool::run
- (anonymous namespace)::CodeCoverageTool::run
- 55.27% (anonymous namespace)::CodeCoverageTool::run
- 54.95% llvm::CoverageExporterLcov::renderRoot
- 54.63% llvm::CoverageExporterLcov::renderRoot
- 53.91% llvm::coverage::FunctionRecordIterator::skipOtherFiles
14.34% bcmp
- 4.12% (anonymous namespace)::CodeCoverageTool::load
- 4.11% (anonymous namespace)::CodeCoverageTool::load
- 3.47% llvm::coverage::CoverageMapping::load
+ 1.77% llvm::coverage::BinaryCoverageReader::create
1.22% llvm::coverage::CoverageMapping::loadFunctionRecord
+ 0.64% llvm::coverage::CoverageMapping::loadFunctionRecord
- 21.05% 0.01% Consumer 0 grcov [.] grcov::consumer
- 21.04% grcov::consumer
- 11.91% grcov::llvm_tools::profraws_to_lcov
- 11.91% grcov::llvm_tools::run (inlined)
- 11.91% std::process::Command::output (inlined)
- 11.90% core::result::Result<T,E>::and_then (inlined)
std::process::Command::output::_$u7b$$u7b$closure$u7d$$u7d$::h29c88bdea84f03c7 (inlined)
- std::process::Child::wait_with_output
+ 11.90% std::sys::unix::pipe::read2 (inlined)
- 5.76% grcov::parser::parse_lcov
- 4.09% core::iter::traits::iterator::Iterator::collect (inlined)
- <alloc::string::String as core::iter::traits::collect::FromIterator<char>>::from_iter
+ 4.07% <alloc::string::String as core::iter::traits::collect::Extend<char>>::extend (inlined)
+ 0.54% std::collections::hash::map::HashMap<K,V,S>::insert (inlined)
- 3.37% grcov::add_results (inlined)
- 2.48% grcov::path_rewriting::canonicalize_path (inlined)
+ std::fs::canonicalize (inlined)
0.79% grcov::merge_results
10% of execution time is spent in memset
as part of the read in Child::wait_with_output
.
So the pipe here is definitely a bottleneck. But the huge amount of cycles spent on skipOtherFiles
in llvm-cov
also feels pretty concerning.
It seems like a fairly straightforward method: https://github.com/llvm-mirror/llvm/blob/2c4ca6832fa6b306ee6a7010bfb80a3f2596f824/lib/ProfileData/Coverage/CoverageMapping.cpp#L189-L195
But it also appears to do a linear search, which could certainly cause problems for even modestly sized inputs.
One obvious win here would be to run llvm-cov export
in parallel. Doesn't solve the fact that llvm-cov export
itself is seemingly fairly slow here (~4s per invocation), but at least it avoids the wait time being 4s times the number of binaries in target/debug/deps
(in my case, there are 20).
Just invoking llvm-cov export
on binaries
through rayon
gave me a huge speedup, even if I don't specifically subset to /deps/
. Overall:
Benchmark 1: ./original
Time (mean _ _): 71.804 s _ 0.373 s [User: 68.919 s, System: 17.522 s]
Range (min _ max): 71.182 s _ 72.400 s 10 runs
Benchmark 2: ./original-deps
Time (mean _ _): 44.957 s _ 0.336 s [User: 50.187 s, System: 7.719 s]
Range (min _ max): 44.540 s _ 45.441 s 10 runs
Benchmark 3: ./parallel
Time (mean _ _): 20.869 s _ 0.107 s [User: 121.422 s, System: 43.629 s]
Range (min _ max): 20.749 s _ 21.037 s 10 runs
Benchmark 4: ./parallel-deps
Time (mean _ _): 14.561 s _ 0.054 s [User: 94.777 s, System: 8.153 s]
Range (min _ max): 14.474 s _ 14.637 s 10 runs
Summary
'./parallel-deps' ran
1.43 _ 0.01 times faster than './parallel'
3.09 _ 0.03 times faster than './original-deps'
4.93 _ 0.03 times faster than './original'
The project I'm on has 1,400 items in target/debug/deps/
so that 4s is noteworthy
You can ignore all the .d
files though — grcov will filter those out eagerly. What matters is how many binary files you have in debug/deps
(the number of executable files there is a reasonable proxy number). But still, yes.
Interestingly, passing the --threads
flag to llvm-cov export
appears to not have any effect. At least I don't see any speed-up from it. Which suggests that llvm-cov export
itself could probably also use some love to avoid this significant sequential cost as part of its execution.
I opened two PRs, one for the parallelization of invocations (https://github.com/mozilla/grcov/pull/1015) and one for recommending folks point at /deps/
in the README (https://github.com/mozilla/grcov/pull/1014).
Filed an issue with LLVM about llvm-cov export
being slow: https://github.com/llvm/llvm-project/issues/62079
Not sure where to ask, but I certainly can help testing things out.
I'm trying to setup grcov against a repository. Searching for binaries after running cargo test
with find -type f -executable -exec file -i '{}' \; | rg 'executable; charset=binary'
shows me 1146 results (intermediary and those used for tests binaries ofc). Of which, only 111 reside in ./target/../deps
.
Firstly, I'm not really sure (as grcov doesn't output any logs for me) what is going on there.
Then, should I target all these binaries? Or those in ./target/../deps
? Or just the main one? It's not explained anywhere. Also, --binary-path
does not accept multiple input and it's not mentioned how deep does it look for binaries (this is partly answered in https://github.com/mozilla/grcov/pull/1015#issuecomment-1507071173, I got almost all the files covered).
It might also be worth suggesting that people use a separate profile for coverage on their systems. For example, my debug/deps folder that I was trying to run grcov on has over 40k files in it which made it take ~12m to run!
[profile.coverage]
inherits = "test"
Adding a coverage profile and doing a fresh test run dropped that number down to ~3k deps files which let grcov do what it needed to do in just over a minute!
This is extracted from the discussion in #326.
In some projects, some of the time, when I execute
grcov
, it takes a very long time to execute, using 100% of one core for several minutes at the time. I executed it like this:I tried running
perf
on thegrcov
invocation from shortly after it started until I killed it, and the results were interesting:A run of
strace -f -c
was also instructive:It looks like a lot of time is spent just calling
llvm-cov export
, which then spends a lot of its time skipping/ignoring files. And a lot of time is spend on locking somewhere. Looking at what thosellvm-cov exports
operate on, it looks like it's every file under$X/cb-coverage-target/debug
:(note that I killed the process at this point — I assume it would have continued going through all the files)
This... feels like a problem.