Open pcc opened 6 years ago
Sorry, no progress yet-- I spend a bit of time on it and dead-ended. Looking at it again.
Any updates on this?
There is a change that significantly affects tsan fast path: https://reviews.llvm.org/D54889 I afraid doing any measurements now that we know the fast path has degraded already. We can commit to something that will have negative effect later but will be very hard to fix/remove.
Note that this change has affected code-gen of tsan runtime functions, so maybe the easier route will be to check difference in machine code: https://reviews.llvm.org/D54821
Please make sure that both of your build trees are stage 2 builds. The issue manifests in how the compiler builds the tsan runtime library; tsan uses $CXX to build its runtime library; not the just built compiler.
You should be able to check that the unit test binaries are built correctly by running the compiler-rt/lib/tsan/check_analyze.sh script on both binaries; the test should fail for the "good" binary and pass for the "bad" one (or vice versa prior to my recent change to that file).
If that doesn't help let me know.
Peter, perhaps we can exchange executables: I'll see if yours have the performance effect on my machine, and vice versa. Are you using something close enough to Ubuntu 18.04 that we can make this work?
I'm having trouble reproducing this effect using r347444 on an i7-6950X.
Here are 3 runs of r347444:
regehr@john-home:~/llvm/build$ projects/compiler-rt/lib/tsan/tests/rtl/TsanRtlTest-x86_64-Test --gtest_also_run_disabled_tests --gtest_filter=DISABLED_BENCH.Mop8 Note: Google Test filter = DISABLED_BENCH.Mop8 [==========] Running 3 tests from 1 test case. [----------] Global test environment set-up. [----------] 3 tests from DISABLED_BENCH [ RUN ] DISABLED_BENCH.Mop8 [ OK ] DISABLED_BENCH.Mop8 (71 ms) [ RUN ] DISABLED_BENCH.Mop8Read [ OK ] DISABLED_BENCH.Mop8Read (1025 ms) [ RUN ] DISABLED_BENCH.Mop8Write [ OK ] DISABLED_BENCH.Mop8Write (955 ms) [----------] 3 tests from DISABLED_BENCH (2051 ms total)
[----------] Global test environment tear-down [==========] 3 tests from 1 test case ran. (2051 ms total) [ PASSED ] 3 tests. regehr@john-home:~/llvm/build$ projects/compiler-rt/lib/tsan/tests/rtl/TsanRtlTest-x86_64-Test --gtest_also_run_disabled_tests --gtest_filter=DISABLED_BENCH.Mop8 Note: Google Test filter = DISABLED_BENCH.Mop8 [==========] Running 3 tests from 1 test case. [----------] Global test environment set-up. [----------] 3 tests from DISABLED_BENCH [ RUN ] DISABLED_BENCH.Mop8 [ OK ] DISABLED_BENCH.Mop8 (70 ms) [ RUN ] DISABLED_BENCH.Mop8Read [ OK ] DISABLED_BENCH.Mop8Read (988 ms) [ RUN ] DISABLED_BENCH.Mop8Write [ OK ] DISABLED_BENCH.Mop8Write (911 ms) [----------] 3 tests from DISABLED_BENCH (1969 ms total)
[----------] Global test environment tear-down [==========] 3 tests from 1 test case ran. (1971 ms total) [ PASSED ] 3 tests. regehr@john-home:~/llvm/build$ projects/compiler-rt/lib/tsan/tests/rtl/TsanRtlTest-x86_64-Test --gtest_also_run_disabled_tests --gtest_filter=DISABLED_BENCH.Mop8 Note: Google Test filter = DISABLED_BENCH.Mop8 [==========] Running 3 tests from 1 test case. [----------] Global test environment set-up. [----------] 3 tests from DISABLED_BENCH [ RUN ] DISABLED_BENCH.Mop8 [ OK ] DISABLED_BENCH.Mop8 (79 ms) [ RUN ] DISABLED_BENCH.Mop8Read [ OK ] DISABLED_BENCH.Mop8Read (1069 ms) [ RUN ] DISABLED_BENCH.Mop8Write [ OK ] DISABLED_BENCH.Mop8Write (949 ms) [----------] 3 tests from DISABLED_BENCH (2097 ms total)
[----------] Global test environment tear-down [==========] 3 tests from 1 test case ran. (2097 ms total) [ PASSED ] 3 tests. regehr@john-home:~/llvm/build$
And here are 3 runs with the r347379 patch backed out:
regehr@john-home:~/llvm-bad-lvi/build$ projects/compiler-rt/lib/tsan/tests/rtl/TsanRtlTest-x86_64-Test --gtest_also_run_disabled_tests --gtest_filter=DISABLED_BENCH.Mop8 Note: Google Test filter = DISABLED_BENCH.Mop8 [==========] Running 3 tests from 1 test case. [----------] Global test environment set-up. [----------] 3 tests from DISABLED_BENCH [ RUN ] DISABLED_BENCH.Mop8 [ OK ] DISABLED_BENCH.Mop8 (79 ms) [ RUN ] DISABLED_BENCH.Mop8Read [ OK ] DISABLED_BENCH.Mop8Read (1021 ms) [ RUN ] DISABLED_BENCH.Mop8Write [ OK ] DISABLED_BENCH.Mop8Write (896 ms) [----------] 3 tests from DISABLED_BENCH (1997 ms total)
[----------] Global test environment tear-down [==========] 3 tests from 1 test case ran. (1997 ms total) [ PASSED ] 3 tests. regehr@john-home:~/llvm-bad-lvi/build$ projects/compiler-rt/lib/tsan/tests/rtl/TsanRtlTest-x86_64-Test --gtest_also_run_disabled_tests --gtest_filter=DISABLED_BENCH.Mop8 Note: Google Test filter = DISABLED_BENCH.Mop8 [==========] Running 3 tests from 1 test case. [----------] Global test environment set-up. [----------] 3 tests from DISABLED_BENCH [ RUN ] DISABLED_BENCH.Mop8 [ OK ] DISABLED_BENCH.Mop8 (80 ms) [ RUN ] DISABLED_BENCH.Mop8Read [ OK ] DISABLED_BENCH.Mop8Read (1131 ms) [ RUN ] DISABLED_BENCH.Mop8Write [ OK ] DISABLED_BENCH.Mop8Write (961 ms) [----------] 3 tests from DISABLED_BENCH (2172 ms total)
[----------] Global test environment tear-down [==========] 3 tests from 1 test case ran. (2172 ms total) [ PASSED ] 3 tests. regehr@john-home:~/llvm-bad-lvi/build$ projects/compiler-rt/lib/tsan/tests/rtl/TsanRtlTest-x86_64-Test --gtest_also_run_disabled_tests --gtest_filter=DISABLED_BENCH.Mop8 Note: Google Test filter = DISABLED_BENCH.Mop8 [==========] Running 3 tests from 1 test case. [----------] Global test environment set-up. [----------] 3 tests from DISABLED_BENCH [ RUN ] DISABLED_BENCH.Mop8 [ OK ] DISABLED_BENCH.Mop8 (79 ms) [ RUN ] DISABLED_BENCH.Mop8Read [ OK ] DISABLED_BENCH.Mop8Read (1064 ms) [ RUN ] DISABLED_BENCH.Mop8Write [ OK ] DISABLED_BENCH.Mop8Write (937 ms) [----------] 3 tests from DISABLED_BENCH (2081 ms total)
[----------] Global test environment tear-down [==========] 3 tests from 1 test case ran. (2081 ms total) [ PASSED ] 3 tests. regehr@john-home:~/llvm-bad-lvi/build$
There's a fair amount of variation across runs (this is with all cores using the "performance" cpufreq governor) but it doesn't look to me like backing out the patch gives a 10% speedup on this test.
Any suggestions how to proceed here?
Thanks Peter, I'll dig into this later today, thought the fault seems likely to lie on the TSan / codegen side (since the effect of r347379 is only to increase the precision of LVI a bit) and I don't know that code.
Extended Description
It seems that r347379 has had the effect of making the __tsan_read8 and __tsan_write8 functions about 10% slower. These functions are performance critical since they are called on every 64-bit load and store in a tsan-instrumented function. This was caught by the sanitizer-x86_64-linux-autoconf bot: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-autoconf/builds/31402/steps/tsan%20analyze/logs/stdio
I can observe the regression on my local machine like this:
$ projects/compiler-rt/lib/tsan/tests/rtl/TsanRtlTest-x86_64-Test --gtest_also_run_disabled_tests --gtest_filter=DISABLED_BENCH.Mop8 Note: Google Test filter = DISABLED_BENCH.Mop8 [==========] Running 3 tests from 1 test case. [----------] Global test environment set-up. [----------] 3 tests from DISABLED_BENCH [ RUN ] DISABLED_BENCH.Mop8 [ OK ] DISABLED_BENCH.Mop8 (74 ms) [ RUN ] DISABLED_BENCH.Mop8Read [ OK ] DISABLED_BENCH.Mop8Read (820 ms) [ RUN ] DISABLED_BENCH.Mop8Write [ OK ] DISABLED_BENCH.Mop8Write (817 ms) [----------] 3 tests from DISABLED_BENCH (1711 ms total)
[----------] Global test environment tear-down [==========] 3 tests from 1 test case ran. (1711 ms total) [ PASSED ] 3 tests. $ projects/compiler-rt/lib/tsan/tests/rtl/TsanRtlTest-x86_64-Test1 --gtest_also_run_disabled_tests --gtest_filter=DISABLED_BENCH.Mop8 Note: Google Test filter = DISABLED_BENCH.Mop8 [==========] Running 3 tests from 1 test case. [----------] Global test environment set-up. [----------] 3 tests from DISABLED_BENCH [ RUN ] DISABLED_BENCH.Mop8 [ OK ] DISABLED_BENCH.Mop8 (91 ms) [ RUN ] DISABLED_BENCH.Mop8Read [ OK ] DISABLED_BENCH.Mop8Read (905 ms) [ RUN ] DISABLED_BENCH.Mop8Write [ OK ] DISABLED_BENCH.Mop8Write (889 ms) [----------] 3 tests from DISABLED_BENCH (1885 ms total)
[----------] Global test environment tear-down [==========] 3 tests from 1 test case ran. (1885 ms total) [ PASSED ] 3 tests.
and the timings are reasonably consistent between runs.