grcov sometimes takes very long to execute

jonhoo commented 1 year ago

This is extracted from the discussion in #326.

In some projects, some of the time, when I execute grcov, it takes a very long time to execute, using 100% of one core for several minutes at the time. I executed it like this:

$ pgrep -a grcov
61819 grcov $X/cb-coverage-target/profraw --source-dir . --binary-path $X/cb-coverage-target/debug/ --branch --ignore-not-existing <a few --excl-* args> --keep-only src/* --keep-only src/* --keep-only src/bin/* -t html -o $X/coverage/
$ ls -la $X/cb-coverage-target/profraw
.rw-r--r-- 3.1M jongje 11 Apr 16:45 61284-8806487336122761277_0.profraw
.rw-r--r-- 2.6M jongje 11 Apr 16:45 61289-8403376454973083594_0.profraw
.rw-r--r--  24k jongje 11 Apr 16:45 61296-16724270415138201609_0.profraw
.rw-r--r-- 5.5M jongje 11 Apr 16:45 61297-4578242808341374556_0.profraw
.rw-r--r--  224 jongje 11 Apr 16:45 61378-18005603889831348665_0.profraw
.rw-r--r-- 4.6M jongje 11 Apr 16:45 61379-12535087358149662424_0.profraw
.rw-r--r-- 5.4M jongje 11 Apr 16:45 61458-13306843052265737977_0.profraw
.rw-r--r-- 5.5M jongje 11 Apr 16:45 61573-2063728991251543943_0.profraw
.rw-r--r-- 5.4M jongje 11 Apr 16:45 61659-2807099239873412591_0.profraw
.rw-r--r-- 336k jongje 11 Apr 16:45 61742-2870874979089868175_0.profraw
.rw-r--r-- 312k jongje 11 Apr 16:45 61742-5935495699248178733_0.profraw
.rw-r--r-- 543k jongje 11 Apr 16:45 61742-6234157094333981707_0.profraw
.rw-r--r-- 206k jongje 11 Apr 16:45 61742-8110670833310045757_0.profraw
.rw-r--r-- 8.3k jongje 11 Apr 16:45 61742-8156326955437846749_0.profraw
.rw-r--r--  27k jongje 11 Apr 16:45 61742-13964213668791941633_0.profraw
.rw-r--r-- 323k jongje 11 Apr 16:45 61742-15978164622266936944_0.profraw
$ ls -la $X/cb-coverage-target/debug/
.rw-r--r--    0 jongje 11 Apr 16:20 .cargo-lock
drwxr-xr-x    - jongje 11 Apr 16:20 .fingerprint
drwxr-xr-x    - jongje 11 Apr 16:20 build
drwxr-xr-x    - jongje 11 Apr 16:21 deps
drwxr-xr-x    - jongje 11 Apr 16:20 examples
drwxr-xr-x    - jongje 11 Apr 16:21 incremental
.rwxr-xr-x 329M jongje 11 Apr 16:21 binary-1
.rwxr-xr-x 379M jongje 11 Apr 16:21 binary-2

I tried running perf on the grcov invocation from shortly after it started until I killed it, and the results were interesting:

-   69.78%     0.00%  llvm-cov    llvm-cov                          [.] (anonymous namespace)::CodeCoverageTool::run
   - (anonymous namespace)::CodeCoverageTool::run
      - 64.30% (anonymous namespace)::CodeCoverageTool::run
         - 63.89% llvm::CoverageExporterLcov::renderRoot
            - 63.50% llvm::CoverageExporterLcov::renderRoot
               - 62.60% llvm::coverage::FunctionRecordIterator::skipOtherFiles
                   16.62% bcmp
      - 5.47% (anonymous namespace)::CodeCoverageTool::load

A run of strace -f -c was also instructive:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 55.50   23.663735        1316     17972        18 futex
 30.72   13.098805      127172       103           wait4
  5.21    2.223218          27     79948     23060 read
  3.42    1.457714          21     68292           write
  2.19    0.934105          40     23164           ppoll
  0.58    0.247434          17     14333           rt_sigaction
  0.51    0.215750          34      6283      5047 openat
  0.42    0.180645          18     10009           brk
  0.31    0.130493          38      3401           mmap
  0.26    0.112478          31      3605      2884 newfstatat
  0.23    0.098308          31      3158           mprotect
  0.12    0.051731          31      1648           close
  0.11    0.044941          31      1442           fstat
  0.10    0.043112          53       803           munmap
  0.06    0.024312          26       930           rt_sigprocmask
  0.05    0.021149          29       721           prlimit64
  0.04    0.019140         184       104           mremap
  0.02    0.009636          46       206           clone
  0.02    0.009387          30       309           ioctl
  0.02    0.008547          27       309           dup3
  0.02    0.008122          39       206           sigaltstack
  0.02    0.006905          33       206           pipe2
  0.01    0.006072          29       206           set_robust_list
  0.01    0.006001          29       206       206 lseek
  0.01    0.005197          50       103           execve
  0.01    0.003782          36       103           readlinkat
  0.01    0.003593          34       103           madvise
  0.01    0.003435          33       103           set_tid_address
  0.01    0.002718          26       103       103 faccessat
  0.00    0.000000           0         1           tgkill
  0.00    0.000000           0         1           getpid
  0.00    0.000000           0         1           gettid
------ ----------- ----------- --------- --------- ----------------
100.00   42.640465                238082     31318 total

It looks like a lot of time is spent just calling llvm-cov export, which then spends a lot of its time skipping/ignoring files. And a lot of time is spend on locking somewhere. Looking at what those llvm-cov exports operate on, it looks like it's every file under $X/cb-coverage-target/debug:

$X/cb-coverage-target/debug/new-crate-yanker
$X/cb-coverage-target/debug/build/serde_derive-998ef328dc8205c7/build_script_build-998ef328dc8205c7
$X/cb-coverage-target/debug/build/serde_derive-998ef328dc8205c7/build-script-build
$X/cb-coverage-target/debug/build/futures-task-a43cf798c741c1e5/build_script_build-a43cf798c741c1e5
$X/cb-coverage-target/debug/build/futures-task-a43cf798c741c1e5/build-script-build
$X/cb-coverage-target/debug/build/semver-587cd36212a42b68/build_script_build-587cd36212a42b68
$X/cb-coverage-target/debug/build/semver-587cd36212a42b68/build-script-build
$X/cb-coverage-target/debug/build/httparse-1b4d98bc49e8be40/build_script_build-1b4d98bc49e8be40
$X/cb-coverage-target/debug/build/httparse-1b4d98bc49e8be40/build-script-build
$X/cb-coverage-target/debug/build/rayon-core-ebfec758998913f3/build_script_build-ebfec758998913f3
$X/cb-coverage-target/debug/build/rayon-core-ebfec758998913f3/build-script-build
$X/cb-coverage-target/debug/build/slab-5b8ce3de708fe0c7/build_script_build-5b8ce3de708fe0c7
$X/cb-coverage-target/debug/build/slab-5b8ce3de708fe0c7/build-script-build
$X/cb-coverage-target/debug/build/proc-macro2-7dff807bfcb44bc2/build-script-build
$X/cb-coverage-target/debug/build/proc-macro2-7dff807bfcb44bc2/build_script_build-7dff807bfcb44bc2
$X/cb-coverage-target/debug/build/curl-6d21905df01ab6dc/build-script-build
$X/cb-coverage-target/debug/build/curl-6d21905df01ab6dc/build_script_build-6d21905df01ab6dc
$X/cb-coverage-target/debug/build/signal-hook-d988d2a242379f7e/build_script_build-d988d2a242379f7e
$X/cb-coverage-target/debug/build/signal-hook-d988d2a242379f7e/build-script-build
$X/cb-coverage-target/debug/build/anyhow-f34f4b00a66b8eb6/build_script_build-f34f4b00a66b8eb6
$X/cb-coverage-target/debug/build/anyhow-f34f4b00a66b8eb6/build-script-build
$X/cb-coverage-target/debug/build/libz-sys-65752cf9edf4eafd/build_script_build-65752cf9edf4eafd
$X/cb-coverage-target/debug/build/libz-sys-65752cf9edf4eafd/build-script-build
$X/cb-coverage-target/debug/build/libgit2-sys-0585917da9ed8876/build_script_build-0585917da9ed8876
$X/cb-coverage-target/debug/build/libgit2-sys-0585917da9ed8876/build-script-build
$X/cb-coverage-target/debug/build/unicase-eae804e298002f99/build_script_build-eae804e298002f99
$X/cb-coverage-target/debug/build/unicase-eae804e298002f99/build-script-build
$X/cb-coverage-target/debug/build/native-tls-84167741b66fd17f/build_script_build-84167741b66fd17f
$X/cb-coverage-target/debug/build/native-tls-84167741b66fd17f/build-script-build
$X/cb-coverage-target/debug/build/openssl-2794832cd06ea98d/build-script-build
$X/cb-coverage-target/debug/build/openssl-2794832cd06ea98d/build_script_build-2794832cd06ea98d
$X/cb-coverage-target/debug/build/libc-d343cd3686b51b54/build-script-build
$X/cb-coverage-target/debug/build/libc-d343cd3686b51b54/build_script_build-d343cd3686b51b54
$X/cb-coverage-target/debug/build/crc32fast-2813e5891a11c7ef/build-script-build
$X/cb-coverage-target/debug/build/crc32fast-2813e5891a11c7ef/build_script_build-2813e5891a11c7ef
$X/cb-coverage-target/debug/build/ahash-16b0f1e509e62802/build_script_build-16b0f1e509e62802
$X/cb-coverage-target/debug/build/ahash-16b0f1e509e62802/build-script-build
$X/cb-coverage-target/debug/build/rustversion-590d0d1f54eb250a/build_script_build-590d0d1f54eb250a
$X/cb-coverage-target/debug/build/rustversion-590d0d1f54eb250a/build-script-build
$X/cb-coverage-target/debug/build/futures-core-077e504f33c63998/build_script_build-077e504f33c63998
$X/cb-coverage-target/debug/build/futures-core-077e504f33c63998/build-script-build
$X/cb-coverage-target/debug/build/crossbeam-epoch-39b61d173fab0739/build_script_build-39b61d173fab0739
$X/cb-coverage-target/debug/build/crossbeam-epoch-39b61d173fab0739/build-script-build
$X/cb-coverage-target/debug/build/crossbeam-queue-0e851ae96f377934/build-script-build
$X/cb-coverage-target/debug/build/crossbeam-queue-0e851ae96f377934/build_script_build-0e851ae96f377934
$X/cb-coverage-target/debug/build/curl-sys-6dd92ae058ac86d5/build_script_build-6dd92ae058ac86d5
$X/cb-coverage-target/debug/build/curl-sys-6dd92ae058ac86d5/build-script-build
$X/cb-coverage-target/debug/build/tokio-db808a7b9dd0a4d7/build_script_build-db808a7b9dd0a4d7
$X/cb-coverage-target/debug/build/tokio-db808a7b9dd0a4d7/build-script-build
$X/cb-coverage-target/debug/build/futures-util-07362946f00908de/build-script-build
$X/cb-coverage-target/debug/build/futures-util-07362946f00908de/build_script_build-07362946f00908de
$X/cb-coverage-target/debug/build/memchr-88b0eb3a56660b40/build-script-build
$X/cb-coverage-target/debug/build/memchr-88b0eb3a56660b40/build_script_build-88b0eb3a56660b40
$X/cb-coverage-target/debug/build/indexmap-b7d01c653a2ccc7e/build_script_build-b7d01c653a2ccc7e
$X/cb-coverage-target/debug/build/indexmap-b7d01c653a2ccc7e/build-script-build
$X/cb-coverage-target/debug/build/parking_lot_core-6fd58c973a766b08/build_script_build-6fd58c973a766b08
$X/cb-coverage-target/debug/build/parking_lot_core-6fd58c973a766b08/build-script-build
$X/cb-coverage-target/debug/build/log-0b30c4bda57051c7/build_script_build-0b30c4bda57051c7
$X/cb-coverage-target/debug/build/log-0b30c4bda57051c7/build-script-build
$X/cb-coverage-target/debug/build/thiserror-17150905cea044c2/build-script-build
$X/cb-coverage-target/debug/build/thiserror-17150905cea044c2/build_script_build-17150905cea044c2
$X/cb-coverage-target/debug/build/num-traits-54ae66f0cf07f718/build_script_build-54ae66f0cf07f718
$X/cb-coverage-target/debug/build/num-traits-54ae66f0cf07f718/build-script-build
$X/cb-coverage-target/debug/build/mime_guess-879a91c975411707/build_script_build-879a91c975411707
$X/cb-coverage-target/debug/build/mime_guess-879a91c975411707/build-script-build
$X/cb-coverage-target/debug/build/smartstring-cc9df01996da7da8/build_script_build-cc9df01996da7da8
$X/cb-coverage-target/debug/build/smartstring-cc9df01996da7da8/build-script-build
$X/cb-coverage-target/debug/build/openssl-sys-da57f73f42886402/build-script-main
$X/cb-coverage-target/debug/build/openssl-sys-da57f73f42886402/build_script_main-da57f73f42886402
$X/cb-coverage-target/debug/build/futures-channel-2f1a56bb895e8af9/build_script_build-2f1a56bb895e8af9
$X/cb-coverage-target/debug/build/futures-channel-2f1a56bb895e8af9/build-script-build
$X/cb-coverage-target/debug/build/io-lifetimes-eacbf0bc3c7ad239/build-script-build
$X/cb-coverage-target/debug/build/io-lifetimes-eacbf0bc3c7ad239/build_script_build-eacbf0bc3c7ad239
$X/cb-coverage-target/debug/build/serde_json-eabd4439168aa9b1/build-script-build
$X/cb-coverage-target/debug/build/serde_json-eabd4439168aa9b1/build_script_build-eabd4439168aa9b1
$X/cb-coverage-target/debug/build/serde-f693450158e32a9e/build_script_build-f693450158e32a9e
$X/cb-coverage-target/debug/build/serde-f693450158e32a9e/build-script-build
$X/cb-coverage-target/debug/build/syn-85b699ae17d6164c/build_script_build-85b699ae17d6164c
$X/cb-coverage-target/debug/build/syn-85b699ae17d6164c/build-script-build
$X/cb-coverage-target/debug/build/rustix-5c4de8728bdbf71b/build_script_build-5c4de8728bdbf71b
$X/cb-coverage-target/debug/build/rustix-5c4de8728bdbf71b/build-script-build
$X/cb-coverage-target/debug/build/memoffset-b33bc06809024b5e/build_script_build-b33bc06809024b5e
$X/cb-coverage-target/debug/build/memoffset-b33bc06809024b5e/build-script-build
$X/cb-coverage-target/debug/build/crossbeam-utils-32f39ee2fcb887c9/build_script_build-32f39ee2fcb887c9
$X/cb-coverage-target/debug/build/crossbeam-utils-32f39ee2fcb887c9/build-script-build
$X/cb-coverage-target/debug/build/lock_api-6e4cdfa674a25fa0/build_script_build-6e4cdfa674a25fa0
$X/cb-coverage-target/debug/build/lock_api-6e4cdfa674a25fa0/build-script-build
$X/cb-coverage-target/debug/build/quote-70374ff0c4ca035a/build_script_build-70374ff0c4ca035a
$X/cb-coverage-target/debug/build/quote-70374ff0c4ca035a/build-script-build
$X/cb-coverage-target/debug/deps/rust_crate_importer-e420b5b99e68eb77
$X/cb-coverage-target/debug/deps/libclap_derive-fca315fae511eb9e.so
$X/cb-coverage-target/debug/deps/librustversion-ec02b5f6658254de.so
$X/cb-coverage-target/debug/deps/new_crate_yanker-f49c8bbfc5d517aa
$X/cb-coverage-target/debug/deps/curation-0f7343d133e87b27
$X/cb-coverage-target/debug/deps/libtracing_attributes-989634260b114420.so
$X/cb-coverage-target/debug/deps/libmaybe_async-a28baa601652d5df.so
$X/cb-coverage-target/debug/deps/libtime_macros-822cef6009cd293e.so
$X/cb-coverage-target/debug/deps/incremental-cf7c52bc74e38f80
$X/cb-coverage-target/debug/deps/rust_crate_importer-551d7f791c701471
$X/cb-coverage-target/debug/deps/test_the_tester-e26c6cb57748a830
$X/cb-coverage-target/debug/deps/full-46e8b31f1f94165f
$X/cb-coverage-target/debug/deps/libmockall_derive-a24c2513b5c405b2.so
$X/cb-coverage-target/debug/deps/errors-b0d705018f3b8ad2
$X/cb-coverage-target/debug/deps/amzn_crate_importer-dd4b9c8c73acb3be
$X/cb-coverage-target/debug/deps/libserde_derive-37214400c3ba75c1.so
$X/cb-coverage-target/debug/deps/libopenssl_macros-c6bede717dd9727c.so
$X/cb-coverage-target/debug/deps/metrics-088bee4901b1d3a9
$X/cb-coverage-target/debug/deps/libthiserror_impl-e92b5bbc64cc236b.so
$X/cb-coverage-target/debug/deps/new_crate_yanker-8f4c215c7ebf15b3
$X/cb-coverage-target/debug/incremental/binary-1-2os3jb1l9jubw/s-gjy2fuhsin-1sm9w5a-2z39ohcat1cfr/3te1m9n4u75hrpc9.o
$X/cb-coverage-target/debug/incremental/binary-1-2os3jb1l9jubw/s-gjy2fuhsin-1sm9w5a-2z39ohcat1cfr/2uq2flx1zjf0d2q5.o
$X/cb-coverage-target/debug/incremental/binary-1-2os3jb1l9jubw/s-gjy2fuhsin-1sm9w5a-2z39ohcat1cfr/41krjfy51yrkbkl4.o
$X/cb-coverage-target/debug/incremental/binary-1-2os3jb1l9jubw/s-gjy2fuhsin-1sm9w5a-2z39ohcat1cfr/3uabpapqrvv81tbe.o
$X/cb-coverage-target/debug/incremental/binary-1-2os3jb1l9jubw/s-gjy2fuhsin-1sm9w5a-2z39ohcat1cfr/4ue7u6ynict5g6id.o
$X/cb-coverage-target/debug/incremental/binary-1-2os3jb1l9jubw/s-gjy2fuhsin-1sm9w5a-2z39ohcat1cfr/1r82valv2likdyag.o
$X/cb-coverage-target/debug/incremental/binary-1-2os3jb1l9jubw/s-gjy2fuhsin-1sm9w5a-2z39ohcat1cfr/4g59v215a3qpy4y4.o
$X/cb-coverage-target/debug/incremental/binary-1-2os3jb1l9jubw/s-gjy2fuhsin-1sm9w5a-2z39ohcat1cfr/2a1pd6dtqtww7n8x.o
$X/cb-coverage-target/debug/incremental/binary-1-2os3jb1l9jubw/s-gjy2fuhsin-1sm9w5a-2z39ohcat1cfr/4eq4pf27tzjj617h.o

(note that I killed the process at this point — I assume it would have continued going through all the files)

This... feels like a problem.

jonhoo commented 1 year ago

Looks like the incremental directory is really causing us problems because it includes lots of files for every output target (binaries, the crate itself, and integration tests): https://gist.github.com/jonhoo/ef66c75137b1f9a14c322faa747b8cc1

I also ran perf on grcov after all the llvm-cov export invocations have finished (but grcov is still running), and it shows a significant portion of execution time being spent in calls to realpath via std::fs::canonicalize through grcov::path_rewriting::canonicalize_path in grcov::add_results (called from grcov::consumer).

jonhoo commented 1 year ago

cc @jerel

jonhoo commented 1 year ago

@marco-c I'm happy to try to help in fixing this, but could use some pointers for how you think this should be handled. For example, would it be okay to just always ignore everything that's in /incremental/?

jerel commented 1 year ago

Thanks for splitting this out and digging into it. What you've documented makes sense with what I've seen in practice. As supporting evidence my workaround solution completes quickly and (I think) does the same work but avoids enumeration of target/debug/**:

TEST_BIN_PATHS=$( \
  cargo test --no-run --message-format=json --tests --lib \
    | grep "{\"" \
    | jq -r "select(.profile.test == true) | .filenames[]" \
    | grep -v dSYM - \
)

rust-profdata merge -sparse ./*.profraw -o combined.profdata

rust-cov export \
  --ignore-filename-regex='/.cargo/registry|.cargo/git|/rustc|target/debug' \
  --instr-profile=combined.profdata \
  --object target/debug/libmy_app.so $TEST_BIN_PATHS \
  --format lcov > lcov-combined.info

grcov lcov-combined.info -s . -o ./html -t html --ignore-not-existing --llvm

If I'm not mistaken all of the binaries that grcov needs to load are at the top level of target/debug/* so maybe restricting the search to top level could work?

jonhoo commented 1 year ago

I think that kind of filtering would end up happening here so we don't even recurse down into that dir: https://github.com/mozilla/grcov/blob/917494ab5bcdfd675d435922f96ac08d7869aad0/src/producer.rs#L171-L196

jonhoo commented 1 year ago

The challenge in doing it in grcov is that we don't know it's being pointed at a Rust build directory. I suppose we could look for /{debug,release}/incremental/ and exclude that, but it feels like we may instead want to add a command-line argument to grcov to specifically exclude directories.

jonhoo commented 1 year ago

So something like an --exclude flag that feeds into a filter_entry call on the WalkDir.

jonhoo commented 1 year ago

Ah, wait, no, that's for walking paths, whereas what we need to filter is the stuff in --binary-path... Hmm..

jonhoo commented 1 year ago

It's over here we'll need to make a change: https://github.com/mozilla/grcov/blob/917494ab5bcdfd675d435922f96ac08d7869aad0/src/llvm_tools.rs#L102-L130

jerel commented 1 year ago

If it's helpful to your exploration I had some uncommitted work from several weeks ago where I was experimenting with accepting binary-path as a vec of binaries and then passing them in to export as --object args https://github.com/mozilla/grcov/compare/master...jerel:multi-binary-path

IIRC it didn't solve the perf issue by itself (in hindsight probably because it was still walking the target/debug/ somewhere else).

jonhoo commented 1 year ago

Oh, that's interesting. You should open that as a PR independently of this; I think it's a good change in isolation.

jonhoo commented 1 year ago

btw, the places you have &vec![...] I think you can just do &[...] instead.

jonhoo commented 1 year ago

I think we have a couple of paths forward:

Special-case exclude /{debug,release}/incremental/.
Support a glob-based --exclude-binaries flag to "subtract" from the directory set chosen by --binary-path.
Special-case exclude .o files.
Only include files marked executable.
Recommend that users pass target/{debug,release}/deps to --binary-path instead of target/{debug,release}.

I don't know which of these paths is better, and we may need the maintainers to chime in on that.

I also think it'd be nice if grcov -v logged every file it runs through https://github.com/mozilla/grcov/blob/917494ab5bcdfd675d435922f96ac08d7869aad0/src/llvm_tools.rs#L122-L132

as that would highlight problems in this area much faster.

marco-c commented 1 year ago

The ideal solution is probably fixing #535, basically kind of reproducing the way llvm-cov ignores files.

1. Special-case exclude `/{debug,release}/incremental/`.

I'm not a fan of special cases, they might silently break things for uncommon use cases.

2. Support a glob-based `--exclude-binaries` flag to "subtract" from the directory set chosen by `--binary-path`.

This would make things more complex for users of grcov. I'd rather we did the right thing by default.

3. Special-case exclude `.o` files.

Same as 1.

4. Only include files marked executable.

With this, we'd risk missing some coverage (e.g. from shared objects).

5. Recommend that users pass `target/{debug,release}/deps` to `--binary-path` instead of `target/{debug,release}`.

Might work, hopefully it doesn't break anything, but I'm not 100% sure.

jonhoo commented 1 year ago

Since 5 was easy to try out, I went that path (so no more .o files), but it's still very slow (multiple minutes) to execute, so I think there's a deeper problem here. An strace -f -c of the grcov execution gave:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 78.58 1102.814225       19947     55286      2332 futex
 11.71  164.292082         606    271057           sched_yield
  7.83  109.948569         157    697683       263 newfstatat
  1.08   15.155137      797638        19           wait4
  0.20    2.862066          28    100751     29369 read
  0.18    2.581477          29     86427           mprotect
  0.14    2.001592          21     93083           write

And here's the perf report:

-   59.39%     0.00%  llvm-cov         llvm-cov                          [.] (anonymous namespace)::CodeCoverageTool::run
   - (anonymous namespace)::CodeCoverageTool::run
      - 55.27% (anonymous namespace)::CodeCoverageTool::run
         - 54.95% llvm::CoverageExporterLcov::renderRoot
            - 54.63% llvm::CoverageExporterLcov::renderRoot
               - 53.91% llvm::coverage::FunctionRecordIterator::skipOtherFiles
                    14.34% bcmp
      - 4.12% (anonymous namespace)::CodeCoverageTool::load
         - 4.11% (anonymous namespace)::CodeCoverageTool::load
            - 3.47% llvm::coverage::CoverageMapping::load
               + 1.77% llvm::coverage::BinaryCoverageReader::create
                 1.22% llvm::coverage::CoverageMapping::loadFunctionRecord
            + 0.64% llvm::coverage::CoverageMapping::loadFunctionRecord
-   21.05%     0.01%  Consumer 0       grcov                             [.] grcov::consumer
   - 21.04% grcov::consumer
      - 11.91% grcov::llvm_tools::profraws_to_lcov
         - 11.91% grcov::llvm_tools::run (inlined)
            - 11.91% std::process::Command::output (inlined)
               - 11.90% core::result::Result<T,E>::and_then (inlined)
                    std::process::Command::output::_$u7b$$u7b$closure$u7d$$u7d$::h29c88bdea84f03c7 (inlined)
                  - std::process::Child::wait_with_output
                     + 11.90% std::sys::unix::pipe::read2 (inlined)
      - 5.76% grcov::parser::parse_lcov
         - 4.09% core::iter::traits::iterator::Iterator::collect (inlined)
            - <alloc::string::String as core::iter::traits::collect::FromIterator<char>>::from_iter
               + 4.07% <alloc::string::String as core::iter::traits::collect::Extend<char>>::extend (inlined)
         + 0.54% std::collections::hash::map::HashMap<K,V,S>::insert (inlined)
      - 3.37% grcov::add_results (inlined)
         - 2.48% grcov::path_rewriting::canonicalize_path (inlined)
            + std::fs::canonicalize (inlined)
           0.79% grcov::merge_results

10% of execution time is spent in memset as part of the read in Child::wait_with_output.

jonhoo commented 1 year ago

So the pipe here is definitely a bottleneck. But the huge amount of cycles spent on skipOtherFiles in llvm-cov also feels pretty concerning.

jonhoo commented 1 year ago

It seems like a fairly straightforward method: https://github.com/llvm-mirror/llvm/blob/2c4ca6832fa6b306ee6a7010bfb80a3f2596f824/lib/ProfileData/Coverage/CoverageMapping.cpp#L189-L195

But it also appears to do a linear search, which could certainly cause problems for even modestly sized inputs.

jonhoo commented 1 year ago

One obvious win here would be to run llvm-cov export in parallel. Doesn't solve the fact that llvm-cov export itself is seemingly fairly slow here (~4s per invocation), but at least it avoids the wait time being 4s times the number of binaries in target/debug/deps (in my case, there are 20).

jonhoo commented 1 year ago

Just invoking llvm-cov export on binaries through rayon gave me a huge speedup, even if I don't specifically subset to /deps/. Overall:

Benchmark 1: ./original
  Time (mean _ _):     71.804 s _  0.373 s    [User: 68.919 s, System: 17.522 s]
  Range (min _ max):   71.182 s _ 72.400 s    10 runs

Benchmark 2: ./original-deps
  Time (mean _ _):     44.957 s _  0.336 s    [User: 50.187 s, System: 7.719 s]
  Range (min _ max):   44.540 s _ 45.441 s    10 runs

Benchmark 3: ./parallel
  Time (mean _ _):     20.869 s _  0.107 s    [User: 121.422 s, System: 43.629 s]
  Range (min _ max):   20.749 s _ 21.037 s    10 runs

Benchmark 4: ./parallel-deps
  Time (mean _ _):     14.561 s _  0.054 s    [User: 94.777 s, System: 8.153 s]
  Range (min _ max):   14.474 s _ 14.637 s    10 runs

Summary
  './parallel-deps' ran
    1.43 _ 0.01 times faster than './parallel'
    3.09 _ 0.03 times faster than './original-deps'
    4.93 _ 0.03 times faster than './original'

jerel commented 1 year ago

The project I'm on has 1,400 items in target/debug/deps/ so that 4s is noteworthy

jonhoo commented 1 year ago

You can ignore all the .d files though — grcov will filter those out eagerly. What matters is how many binary files you have in debug/deps (the number of executable files there is a reasonable proxy number). But still, yes.

jonhoo commented 1 year ago

Interestingly, passing the --threads flag to llvm-cov export appears to not have any effect. At least I don't see any speed-up from it. Which suggests that llvm-cov export itself could probably also use some love to avoid this significant sequential cost as part of its execution.

jonhoo commented 1 year ago

I opened two PRs, one for the parallelization of invocations (https://github.com/mozilla/grcov/pull/1015) and one for recommending folks point at /deps/ in the README (https://github.com/mozilla/grcov/pull/1014).

jonhoo commented 1 year ago

Filed an issue with LLVM about llvm-cov export being slow: https://github.com/llvm/llvm-project/issues/62079

TriplEight commented 1 year ago

Not sure where to ask, but I certainly can help testing things out. I'm trying to setup grcov against a repository. Searching for binaries after running cargo test with find -type f -executable -exec file -i '{}' \; | rg 'executable; charset=binary' shows me 1146 results (intermediary and those used for tests binaries ofc). Of which, only 111 reside in ./target/../deps.

Firstly, I'm not really sure (as grcov doesn't output any logs for me) what is going on there. Then, should I target all these binaries? Or those in ./target/../deps? Or just the main one? It's not explained anywhere. Also, --binary-path does not accept multiple input and it's not mentioned how deep does it look for binaries (this is partly answered in https://github.com/mozilla/grcov/pull/1015#issuecomment-1507071173, I got almost all the files covered).

cole-abbeduto-particle commented 9 months ago

It might also be worth suggesting that people use a separate profile for coverage on their systems. For example, my debug/deps folder that I was trying to run grcov on has over 40k files in it which made it take ~12m to run!

[profile.coverage]
inherits = "test"

Adding a coverage profile and doing a fresh test run dropped that number down to ~3k deps files which let grcov do what it needed to do in just over a minute!

mozilla / grcov

grcov sometimes takes very long to execute #1013