mozilla / grcov

Rust tool to collect and aggregate code coverage data for multiple source files
Mozilla Public License 2.0
1.17k stars 148 forks source link

Parallelize llvm-cov invocations #1015

Closed jonhoo closed 1 year ago

jonhoo commented 1 year ago

llvm-cov can be fairly slow per invocation, but can safely and easily be parallelized for significant speed-ups when there are many binary artifacts.

See https://github.com/mozilla/grcov/issues/1013.

jonhoo commented 1 year ago

@marco-c Do you have a rough timeline for when you might cut another grcov release with this change?

marco-c commented 1 year ago

I will do it later today.

marco-c commented 1 year ago

I released v0.8.18.

jonhoo commented 1 year ago

Thank you!

TriplEight commented 1 year ago

I wanted to bring some feedback (cc https://github.com/mozilla/grcov/issues/1013#issuecomment-1506894495): running with --binary-path ./target/debug/ took 37 min and with --binary-path ./target/debug/deps/ 32 min. (Some help with reducing this time by 30 minutes is highly appreciated though!)

The results (with deps/) seem to exclude only dirs with build.rs of the binaries and the coverage looks very similar.

jonhoo commented 1 year ago

Thanks for reporting back! Are those numbers with this change or without it?

The results (with dirs/) seem to exclude only dirs with main.rs of the binaries

I assume you meant "with deps/"?

Also, can you clarify what you mean about that last part ("dirs with main.rs of the binaries")?

TriplEight commented 1 year ago

@jonhoo yeah, fixed dirs -> deps, and main.rs -> build.rs thanks.

Also, can you clarify what you mean about that last part ("dirs with main.rs of the binaries")?

grcov v0.8.18

I have two html reports, for --binary-path ./target/debug/ and for --binary-path ./target/debug/deps/

I meant build.rs, not main.rs files, that are missing.

$ diff --recursive html3 html3-deps | rg Only

Only in html3/node/overseer: examples # entire directory with index.html that points to minimal-example.rs.html that's not covered at all!
Only in html3/cli: build.rs.html
Only in html3/cli: index.html # points to build.rs above
Only in html3/node/core/pvf: build.rs.html
Only in html3/node/core/pvf: index.html # points to build.rs above
Only in html3/node/test/performance-test: build.rs.html
Only in html3/node/test/performance-test: index.html # points to build.rs above
Only in html3/parachain/test-parachains/adder: build.rs.html
Only in html3/parachain/test-parachains/adder: index.html # points to build.rs above
Only in html3/parachain/test-parachains/halt: build.rs.html
Only in html3/parachain/test-parachains/halt: index.html # points to build.rs above
Only in html3/parachain/test-parachains/undying: build.rs.html
Only in html3/parachain/test-parachains/undying: index.html # points to build.rs above
Only in html3/runtime/kusama: build.rs.html
Only in html3/runtime/kusama: index.html # points to build.rs above
Only in html3/runtime/polkadot: build.rs.html
Only in html3/runtime/polkadot: index.html # points to build.rs above
Only in html3/runtime/rococo: build.rs.html
Only in html3/runtime/rococo: index.html # points to build.rs above
Only in html3/runtime/test-runtime: build.rs.html
Only in html3/runtime/test-runtime: index.html # points to build.rs above
Only in html3/runtime/westend: build.rs.html
Only in html3/runtime/westend: index.html # points to build.rs above

All of those build.rs files contain only short fn main and all of them are considered 100% covered. But nonetheless.

TriplEight commented 1 year ago

also ran strace

$ sudo strace -f -c -p 505935

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ------------------
 56,35 3576,450379        5173    691293         2 poll
 32,45 2059,676227        4859    423829      8087 futex
  6,39  405,785716         203   1992009    691119 read
  4,27  270,890421     1574944       172           wait4
  0,23   14,883007           6   2298014           write
  0,12    7,521074           2   3283938       161 lstat
  0,06    3,978903         857      4642           munmap
  0,05    2,892612           1   2011074           mprotect
  0,03    1,882019          15    118646           brk
  0,02    1,139236         766      1487           clone
  0,01    0,458284          37     12235           mmap
  0,00    0,302150          10     27812        15 statx
  0,00    0,291705         364       801           mremap
  0,00    0,224367          17     12645      3635 openat
  0,00    0,221059         148      1489           set_robust_list
  0,00    0,084905           3     23765           rt_sigaction
  0,00    0,071867         410       175         2 execve
  0,00    0,066057           6      9699           close
  0,00    0,061876          16      3659           sched_yield
  0,00    0,052891          40      1317           madvise
  0,00    0,049122          15      3088      2408 stat
  0,00    0,031974          26      1203           prlimit64
  0,00    0,031960           7      4121           fstat
  0,00    0,016638           8      1871           rt_sigprocmask
  0,00    0,015421          44       343           pipe2
  0,00    0,011500          64       178         5 readlink
  0,00    0,010457           3      3447           getdents64
  0,00    0,006938          17       387           sigaltstack
  0,00    0,006346          12       516         3 ioctl
  0,00    0,004303          12       347       339 lseek
  0,00    0,004014          20       193           sched_getaffinity
  0,00    0,003743          21       173       173 access
  0,00    0,003304           6       513           dup2
  0,00    0,001748          10       173           set_tid_address
  0,00    0,001694           9       173           arch_prctl
  0,00    0,000033           1        17           unlinkat
  0,00    0,000021           5         4           open
  0,00    0,000007           0        13           getcwd
  0,00    0,000005           1         3           getrandom
  0,00    0,000004           2         2           sysinfo
  0,00    0,000003           0         4           uname
  0,00    0,000002           2         1           prctl
  0,00    0,000002           0         3           clock_gettime
  0,00    0,000001           0        32           fcntl
  0,00    0,000000           0         2           restart_syscall
------ ----------- ----------- --------- --------- ------------------
100,00 6347,133995         580  10935508    705949 total
jonhoo commented 1 year ago

Oh, that's interesting that by only using deps we lose coverage of build scripts. That could probably be fixed once multiple --binary-path arguments are permitted by also adding a glob to target/**/cargo_build_script or something like it, but I think it's a worthwhile trade-off to reduce coverage time by 5 minutes!

I think the futex you see there is actually time spent synchronizing with the spawned llvm-cov invocations, and the way to get rid of those is to either improve llvm-cov or reduce the number of times we call it.

TriplEight commented 1 year ago

I'm pretty sure that there's a lot of what we usually don't need to measure in deps, would be cool to exclude this as well.