Create code coverage reports

mbautin commented 2 years ago

We need to generate code coverage reports. We can use Clang source-based coverage: https://clang.llvm.org/docs/SourceBasedCodeCoverage.html

This will be a new build type (or multiple new build types, based on debug and release). The main compiler arguments are: -fprofile-instr-generate -fcoverage-mapping. Each process (unit tests, masters and tservers in external mini clusters) will then output "raw profiles" (.profraw) into locations specified by the LLVM_PROFILE_FILE environment variable. Then these raw profiles need to be "indexed" to create .profdata files using a command like llvm-profdata merge -sparse foo.profraw -o foo.profdata (I think the Clang docs call these files "coverage reports", although they can only be viewed using the llvm-cov show tool).

For proper integration into our testing pipeline, we need to ship per-test indexed reports to generate various kinds of reports (line-oriented vs. file-level summaries).

E.g. line-oriented:

    1|   20|#define BAR(x) ((x) || (x))
                           ^20     ^2
    2|    2|template <typename T> void foo(T x) {
    3|   22|  for (unsigned I = 0; I < 10; ++I) { BAR(I); }
                                   ^22     ^20  ^20^20
    4|    2|}
------------------
| void foo<int>(int):
|      2|    1|template <typename T> void foo(T x) {
|      3|   11|  for (unsigned I = 0; I < 10; ++I) { BAR(I); }
|                                     ^11     ^10  ^10^10
|      4|    1|}
------------------
| void foo<float>(int):
|      2|    1|template <typename T> void foo(T x) {
|      3|   11|  for (unsigned I = 0; I < 10; ++I) { BAR(I); }
|                                     ^11     ^10  ^10^10
|      4|    1|}
------------------

vs. file-level:

# Step 3(c): Create a coverage summary.
% llvm-cov report ./foo -instr-profile=foo.profdata
Filename           Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover     Branches    Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
/tmp/foo.cc             13                 0   100.00%           3                 0   100.00%          13                 0   100.00%           12                  2    83.33%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                   13                 0   100.00%           3                 0   100.00%          13                 0   100.00%           12                  2    83.33%

We then need to decide where to upload these reports, how to host them, and how to compare code coverage between builds.

mbautin commented 2 years ago

To specify LLVM_PROFILE_FILES, we can use similar logic to how we currently name fatal failure reports, e.g.

$ find java -name "*.profraw"
java/yb-pgsql/target/surefire-reports_org.yb.pgsql.TestPgWriteRestart__testWriteRestart/org.yb.pgsql.TestPgWriteRestart.testWriteRestart..m-1.127.97.172.113-port11946.profraw
java/yb-pgsql/target/surefire-reports_org.yb.pgsql.TestPgWriteRestart__testWriteRestart/org.yb.pgsql.TestPgWriteRestart.testWriteRestart..m-2.127.122.93.35-port24351.profraw
java/yb-pgsql/target/surefire-reports_org.yb.pgsql.TestPgWriteRestart__testWriteRestart/org.yb.pgsql.TestPgWriteRestart.testWriteRestart..m-3.127.48.200.186-port28965.profraw
java/yb-pgsql/target/surefire-reports_org.yb.pgsql.TestPgWriteRestart__testWriteRestart/org.yb.pgsql.TestPgWriteRestart.testWriteRestart..ts-1.127.88.193.225-port19059.profraw
java/yb-pgsql/target/surefire-reports_org.yb.pgsql.TestPgWriteRestart__testWriteRestart/org.yb.pgsql.TestPgWriteRestart.testWriteRestart..ts-2.127.198.58.159-port19906.profraw
java/yb-pgsql/target/surefire-reports_org.yb.pgsql.TestPgWriteRestart__testWriteRestart/org.yb.pgsql.TestPgWriteRestart.testWriteRestart..ts-3.127.206.37.75-port18182.profraw

(there is a bug causing a double dot in the name).

When looking at these files' sizes, there are a couple of issues:

Files for masters are zero size. This is probably caused by not enough profile data being generated before the process is killed. We need some hook to flush profile files before killing the process, e.g. using these APIs.
Files for tservers in a single test can reach more than 1 GB (1179436792 bytes). We need to postprocess these files as part of collecting results after running the test.

def- commented 2 years ago

I'm now looking into this, rebased your change for now, and seems to work as it did before.

Files for masters are zero size. This is probably caused by not enough profile data being generated before the process is killed. We need some hook to flush profile files before killing the process, e.g. using these APIs.

I can reproduce this. My guess is we are missing the -fprofile-instr-generate -fcoverage-mapping during linking, then the writing out on exit handler is not done. Will try to fix in CMakeLists.txt

Edit: The linking step was not enough. In the test output I see that we are killing yb-master with SIGKILL, which means we can't catch and handle this. Let me check if I can fix this by waiting longer when we are in coverage builds, since writing out the coverage information will take some time:

I0222 14:12:01.467499 81167744 external_mini_cluster.cc:2226] { daemon_id: m-1 bound_rpc: :0 } Starting Shutdown()
I0222 14:12:01.467522 81167744 external_mini_cluster.cc:2269] Killing /Users/deen/code/yugabyte-db/build/debugcov-clang-dynamic-arm64-ninja/tests-pgwrapper/../bin/yb-master with pid 47052 with SIGKILL

As an aside, if we expect many crashes during testing and want more accurate coverage:

“%c” expands out to nothing, but enables a mode in which profile counter updates are continuously synced to a file. This means that if the instrumented program crashes, or is killed by a signal, perfect coverage information can still be recovered. Continuous mode does not support value profiling for PGO, and is only supported on Darwin at the moment. Support for Linux may be mostly complete but requires testing, and support for Windows may require more extensive changes: please get involved if you are interested in porting this feature.

But I wouldn't set it for now, I'm guessing Linux will be the target arch.

Files for tservers in a single test can reach more than 1 GB (1179436792 bytes). We need to postprocess these files as part of collecting results after running the test.

Can also see this, should compress well, so I will try that. Edit: xz -9 gives me 1.3 GB -> 101 MB.

Do we also want to instrument our postgres server or just tserver and master? Edit: I guess so.

def- commented 2 years ago

Edit: The linking step was not enough. In the test output I see that we are killing yb-master with SIGKILL, which means we can't catch and handle this. Let me check if I can fix this by waiting longer when we are in coverage builds, since writing out the coverage information will take some time.

This was caused by us using SIGTERM to terminate master processes (and sometimes tserver too). I fixed it by hooking into the glog signal handler to flush then.

There is a cleaner way by sending FlushCoverageRequestPB, which I made the C++ tests use, but wasn't sure how to do this in Java tests, and better to handle SIGTERM when we use it so frequently in tests (not a big fan of this approach in general though).

Do we also want to instrument our postgres server or just tserver and master? Edit: I guess so.

Done.

There was also the issue of tservers running multiple times even during one run and overwriting their old profraw files, added the pid to keep the old ones around until we merge them.

Now the next step is to create actual coverage report from this and check if it's reasonable. If so, then it would be interesting to run with all of our tests and get a unified coverage.

def- commented 2 years ago

With %c I can finally get reliable coverage. We are killing processes using signals left and right, handling all of them correctly is kind of difficult, and I have apparently not managed to do so yet. The SIGKILLS can't be handled at all obviously, so would have to be changed for coverage builds. I'll have to check if %c also works on Linux. report1.zip (sorry for double packing, github file size limit hit) An initial html report of ./yb_build.sh debugcov --java-test TestPgWriteRestart is attached.

There is a remaining issue of warning: 45415 functions have mismatched data which happens because of shared library usage. I will check if dependency_graph.py link-whole-program can help here to improve accuracy.

def- commented 2 years ago

Report with all ctest runs: I had to run this in parallel to keep the file size of profraw files down, otherwise my disk would fill up (multiple TB):

  i=0
  sleep 180
  while true; do
    sleep 60
    X=$(find . -name '*profraw*' -mtime +3m)
    /Library/Developer/CommandLineTools/usr/bin/llvm-profdata merge -sparse ${=X} -o ctest$i.profdata && rm ${=X}
    i=$((i+1))
  done

Then finally generated the report:

  /Library/Developer/CommandLineTools/usr/bin/llvm-profdata merge -sparse ctest*.profdata -o ctest_combined.profdata && rm ctest[0-9]*.profdata
  OBJS=$(for i in build/debugcov-clang-dynamic-arm64-ninja/postgres/bin/* build/debugcov-clang-dynamic-arm64-ninja/bin/* build/debugcov-clang-dynamic-arm64-ninja/tests-*/* build/debugcov-clang-dynamic-arm64-ninja/**/*.dylib; do echo -n "-object $i "; done)
  /Library/Developer/CommandLineTools/usr/bin/llvm-cov show ${=OBJS} --instr-profile ctest_combined.profdata --format=html --output-dir=ctest

Next I'm trying to do the same with all java tests.

def- commented 2 years ago

I'm struggling with getting all java tests to run, so far it always stops at some point. Is it possible that the java tests are not that stable on macOS in general? Is there a way to tell the test runner to keep going after failures and not skip the rest?

$ ./yb_build.sh debugcov --java-tests
[...]
[INFO] Running com.yugabyte.driver.core.TestPartitionMetadata
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 44.837 s - in com.yugabyte.driver.core.TestPartitionMetadata
[INFO] Running com.datastax.driver.core.TestSystemTables
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.347 s - in com.datastax.driver.core.TestSystemTables
[INFO]
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR]   TestLoadBalancingPolicy.testHashFunction:101
[ERROR]   TestCreateTable.testCreateTableNumTablets:219
[ERROR]   TestIndex.testInKeyword:2524->doTestInKeyword:2466->BaseCQLTest.assertQuery:578->BaseCQLTest.assertQuery:574 expected:<...4c93-8f61-eabfa4a803[e2, first, second]Row[5b6962dd-3f90-4c93-8f61-eabfa4a80310, NULL, NULL]]> but was:<...4c93-8f61-eabfa4a803[10, NULL, NULL]Row[5b6962dd-3f90-4c93-8f61-eabfa4a803e2, first, second]]>
[ERROR]   TestIndex.testInKeywordInPrepared:2534->doTestInKeyword:2466->BaseCQLTest.assertQuery:578->BaseCQLTest.assertQuery:574 expected:<...4c93-8f61-eabfa4a803[e2, first, second]Row[5b6962dd-3f90-4c93-8f61-eabfa4a80310, NULL, NULL]]> but was:<...4c93-8f61-eabfa4a803[10, NULL, NULL]Row[5b6962dd-3f90-4c93-8f61-eabfa4a803e2, first, second]]>
[ERROR]   TestIndex.testInKeywordInPrepared_Transactional:2540->doTestInKeyword:2466->BaseCQLTest.assertQuery:578->BaseCQLTest.assertQuery:574 expected:<...4c93-8f61-eabfa4a803[e2, first, second]Row[5b6962dd-3f90-4c93-8f61-eabfa4a80310, NULL, NULL]]> but was:<...4c93-8f61-eabfa4a803[10, NULL, NULL]Row[5b6962dd-3f90-4c93-8f61-eabfa4a803e2, first, second]]>
[ERROR]   TestIndex.testInKeyword_Transactional:2529->doTestInKeyword:2466->BaseCQLTest.assertQuery:578->BaseCQLTest.assertQuery:574 expected:<...4c93-8f61-eabfa4a803[e2, first, second]Row[5b6962dd-3f90-4c93-8f61-eabfa4a80310, NULL, NULL]]> but was:<...4c93-8f61-eabfa4a803[10, NULL, NULL]Row[5b6962dd-3f90-4c93-8f61-eabfa4a803e2, first, second]]>
[ERROR] Errors:
[ERROR]   TestAudit>BaseCQLTest.setUpCqlClient:166->BaseCQLTest.connectWithTestDefaults:146 » NoHostAvailable
[ERROR]   TestAudit>BaseCQLTest.setUpCqlClient:166->BaseCQLTest.connectWithTestDefaults:146 » NoHostAvailable
[ERROR]   TestAudit>BaseCQLTest.setUpCqlClient:166->BaseCQLTest.connectWithTestDefaults:146 » NoHostAvailable
[ERROR]   TestAudit>BaseCQLTest.setUpCqlClient:166->BaseCQLTest.connectWithTestDefaults:146 » NoHostAvailable
[ERROR]   TestAudit>BaseCQLTest.setUpCqlClient:166->BaseCQLTest.connectWithTestDefaults:146 » NoHostAvailable
[ERROR]   TestAudit>BaseCQLTest.setUpCqlClient:166->BaseCQLTest.connectWithTestDefaults:146 » NoHostAvailable
[ERROR]   TestAudit>BaseCQLTest.setUpCqlClient:166->BaseCQLTest.connectWithTestDefaults:146 » NoHostAvailable
[ERROR]   TestAudit>BaseCQLTest.setUpCqlClient:166->BaseCQLTest.connectWithTestDefaults:146 » NoHostAvailable
[ERROR]   TestAudit>BaseCQLTest.setUpCqlClient:166->BaseCQLTest.connectWithTestDefaults:146 » NoHostAvailable
[ERROR]   TestAudit>BaseCQLTest.setUpCqlClient:166->BaseCQLTest.connectWithTestDefaults:146 » NoHostAvailable
[ERROR]   TestDecimalDataType.testConversionsLimits:616 » InvalidQuery Invalid Arguments...
[ERROR]   TestVarIntDataType.testConversionsLimits:489 » InvalidQuery Invalid Arguments....
[INFO]
[ERROR] Tests run: 989, Failures: 6, Errors: 12, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for YugaByte 0.8.15-SNAPSHOT:
[INFO]
[INFO] YugaByte ........................................... SUCCESS [  1.352 s]
[INFO] YB Annotations ..................................... SUCCESS [  0.927 s]
[INFO] YB Java Client ..................................... SUCCESS [19:01 min]
[INFO] YB commandline interface ........................... SUCCESS [  0.753 s]
[INFO] YB CQL Support ..................................... FAILURE [  04:04 h]
[INFO] YB CQL Support for 4.x Driver ...................... SKIPPED
[INFO] YB PostgreSQL Support .............................. SKIPPED
[INFO] YB Load Tester ..................................... SKIPPED
[INFO] YB Jedis Tests ..................................... SKIPPED
[INFO] YB CDC Connector ................................... SKIPPED
[INFO] YB Multi API ....................................... SKIPPED
[INFO] YB Manual Support .................................. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  04:23 h
[INFO] Finished at: 2022-03-08T03:59:49+01:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.22.2:test (default-test) on project yb-cql: There are test failures.
[ERROR]
[ERROR] Please refer to /Users/deen/code/yugabyte-db/java/yb-cql/target/surefire-reports for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <args> -rf :yb-cql
[2022-03-08T03:59:49 common-build-env.sh:848 build_yb_java_code_filter_save_output] (End of Java build output)

real    263m37.143s
user    196m55.482s
sys 65m23.727s

def- commented 2 years ago

I'll try again with <yb.forked.test.process.timeout.sec>7200</yb.forked.test.process.timeout.sec> in java/pom.xml

def- commented 2 years ago

With ./yb_build.sh debugcov --java-tests --rjtms --tp 8 I appear to be getting further now, test running for > 12 hours now. Will hopefully get all the java tests now. Next I can add the automatic report generation to yb_build.

def- commented 2 years ago

Now finished the java tests: Java compilation and tests time : 76177 seconds (22 hours!) (34% line coverage) ctests and java tests combined: http://coverage.yugabyte.com.s3-website-us-west-2.amazonaws.com/reports/2.13.0.0-b42/index.html (45% line coverage)

def- commented 2 years ago

Note that there is still some inaccuracy in the reported coverage (warning: 689576 functions have mismatched data), but it is probably close enough to reality for our use. In my experience getting rid of this warning and to get more accurate coverage information, we'd have to link all of the code into a single static executable.

def- commented 2 years ago

Code coverage on macOS works, first one uploaded (should soon be public)

But it should work on Linux, since we have more infrastructure available on Linux. I have two approaches I am evaluating now in parallel:

Enable continuous mode in Clang/LLVM source code for Linux and build with my own version. Might need to fix some issues in LLVM directly for this. Found a failed assertion in current LLVM: https://github.com/llvm/llvm-project/issues/54408 so probably have to use an older LLVM version and check if it doesn't happen there
Catch signals as well as we can and flush coverage data in signal handlers, prevent SIGKILLs during coverage test runs. I have a WIP change for this and manage to catch most signals so far, but definitely not all yet. Right now I am running all tests on it to check how the coverage numbers look with this. Had a problem with this, tests were timing out after ~1 day, so trying again with more RAM on fresh dev server.

def- commented 2 years ago

I created another coverage report last week: http://coverage.yugabyte.com.s3-website-us-west-2.amazonaws.com/reports/2.13.1.0-b60/ Pending some internal discussions I'm pausing this project.

yugabyte / yugabyte-db

Create code coverage reports #10392