llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.33k stars 11.7k forks source link

-runtime-counter-relocation=true causes pathological compile-time increase #83429

Open zmodem opened 7 months ago

zmodem commented 7 months ago

Attached is a reproducer from Chromium: formatutilsgl.ii.gz

Without runtime runtime counter relocation it compiles in 14 s:

$ time clang++ -cc1 -triple thumbv7-unknown-linux-android26 -emit-obj -mrelocation-model pic -pic-level 2 -fhalf-no-semantic-interposition -fmerge-all-constants -fno-delete-null-pointer-checks -mframe-pointer=none -relaxed-aliasing -ffp-contract=off -fno-rounding-math -mconstructor-aliases -funwind-tables=1 -target-cpu generic -target-feature +soft-float-abi -target-feature +vfp2 -target-feature +vfp2sp -target-feature +vfp3 -target-feature +vfp3d16 -target-feature +vfp3d16sp -target-feature +vfp3sp -target-feature -fp16 -target-feature -vfp4 -target-feature -vfp4d16 -target-feature -vfp4d16sp -target-feature -vfp4sp -target-feature -fp-armv8 -target-feature -fp-armv8d16 -target-feature -fp-armv8d16sp -target-feature -fp-armv8sp -target-feature -fullfp16 -target-feature +fp64 -target-feature +d32 -target-feature +neon -target-feature -sha2 -target-feature -aes -target-feature -fp16fml -target-abi aapcs-linux -mfloat-abi soft -debugger-tuning=gdb -gsimple-template-names=simple -debug-forward-template-params -ffunction-sections -fdata-sections -fno-unique-section-names  -O2 -std=c++20 -fdeprecated-macro -fvisibility=hidden -fvisibility-inlines-hidden -femulated-tls -stack-protector 1 -ftrivial-auto-var-init=pattern -fno-rtti -fno-signed-char -fgnuc-version=4.2.1 -fno-implicit-modules -fskip-odr-check-in-gmf -Qn -vectorize-loops -vectorize-slp -mllvm -split-threshold-for-reg-with-hint=0 -fcomplete-member-pointers -o /tmp/a.o /tmp/formatutilsgl.ii -w -fprofile-instrument=clang -fcoverage-mapping -mllvm -limited-coverage-experimental=true                                        

real    0m14.071s
user    0m13.846s
sys     0m0.225s

With runtime counter relocation it takes 8 minutes:

$ time clang++ -cc1 -triple thumbv7-unknown-linux-android26 -emit-obj -mrelocation-model pic -pic-level 2 -fhalf-no-semantic-interposition -fmerge-all-constants -fno-delete-null-pointer-checks -mframe-pointer=none -relaxed-aliasing -ffp-contract=off -fno-rounding-math -mconstructor-aliases -funwind-tables=1 -target-cpu generic -target-feature +soft-float-abi -target-feature +vfp2 -target-feature +vfp2sp -target-feature +vfp3 -target-feature +vfp3d16 -target-feature +vfp3d16sp -target-feature +vfp3sp -target-feature -fp16 -target-feature -vfp4 -target-feature -vfp4d16 -target-feature -vfp4d16sp -target-feature -vfp4sp -target-feature -fp-armv8 -target-feature -fp-armv8d16 -target-feature -fp-armv8d16sp -target-feature -fp-armv8sp -target-feature -fullfp16 -target-feature +fp64 -target-feature +d32 -target-feature +neon -target-feature -sha2 -target-feature -aes -target-feature -fp16fml -target-abi aapcs-linux -mfloat-abi soft -debugger-tuning=gdb -gsimple-template-names=simple -debug-forward-template-params -ffunction-sections -fdata-sections -fno-unique-section-names  -O2 -std=c++20 -fdeprecated-macro -fvisibility=hidden -fvisibility-inlines-hidden -femulated-tls -stack-protector 1 -ftrivial-auto-var-init=pattern -fno-rtti -fno-signed-char -fgnuc-version=4.2.1 -fno-implicit-modules -fskip-odr-check-in-gmf -Qn -vectorize-loops -vectorize-slp -mllvm -split-threshold-for-reg-with-hint=0 -fcomplete-member-pointers -o /tmp/a.o /tmp/formatutilsgl.ii -w -fprofile-instrument=clang -fcoverage-mapping -mllvm -limited-coverage-experimental=true -mllvm -runtime-counter-relocation=true

real    8m15.434s
user    8m15.066s
sys     0m0.313s

(This uses Clang built at a0b3dbaf4b3c01dc7f0a83fce059a26360b58eb2)

zmodem commented 7 months ago

The person who filed this on our side said the time trace suggests the time is spent in ARM Instruction Selection.

llvmbot commented 7 months ago

@llvm/issue-subscribers-backend-arm

Author: Hans (zmodem)

Attached is a reproducer from Chromium: [formatutilsgl.ii.gz](https://github.com/llvm/llvm-project/files/14448933/formatutilsgl.ii.gz) Without runtime runtime counter relocation it compiles in 14 s: ``` $ time clang++ -cc1 -triple thumbv7-unknown-linux-android26 -emit-obj -mrelocation-model pic -pic-level 2 -fhalf-no-semantic-interposition -fmerge-all-constants -fno-delete-null-pointer-checks -mframe-pointer=none -relaxed-aliasing -ffp-contract=off -fno-rounding-math -mconstructor-aliases -funwind-tables=1 -target-cpu generic -target-feature +soft-float-abi -target-feature +vfp2 -target-feature +vfp2sp -target-feature +vfp3 -target-feature +vfp3d16 -target-feature +vfp3d16sp -target-feature +vfp3sp -target-feature -fp16 -target-feature -vfp4 -target-feature -vfp4d16 -target-feature -vfp4d16sp -target-feature -vfp4sp -target-feature -fp-armv8 -target-feature -fp-armv8d16 -target-feature -fp-armv8d16sp -target-feature -fp-armv8sp -target-feature -fullfp16 -target-feature +fp64 -target-feature +d32 -target-feature +neon -target-feature -sha2 -target-feature -aes -target-feature -fp16fml -target-abi aapcs-linux -mfloat-abi soft -debugger-tuning=gdb -gsimple-template-names=simple -debug-forward-template-params -ffunction-sections -fdata-sections -fno-unique-section-names -O2 -std=c++20 -fdeprecated-macro -fvisibility=hidden -fvisibility-inlines-hidden -femulated-tls -stack-protector 1 -ftrivial-auto-var-init=pattern -fno-rtti -fno-signed-char -fgnuc-version=4.2.1 -fno-implicit-modules -fskip-odr-check-in-gmf -Qn -vectorize-loops -vectorize-slp -mllvm -split-threshold-for-reg-with-hint=0 -fcomplete-member-pointers -o /tmp/a.o /tmp/formatutilsgl.ii -w -fprofile-instrument=clang -fcoverage-mapping -mllvm -limited-coverage-experimental=true real 0m14.071s user 0m13.846s sys 0m0.225s ``` With runtime counter relocation it takes 8 minutes: ``` $ time clang++ -cc1 -triple thumbv7-unknown-linux-android26 -emit-obj -mrelocation-model pic -pic-level 2 -fhalf-no-semantic-interposition -fmerge-all-constants -fno-delete-null-pointer-checks -mframe-pointer=none -relaxed-aliasing -ffp-contract=off -fno-rounding-math -mconstructor-aliases -funwind-tables=1 -target-cpu generic -target-feature +soft-float-abi -target-feature +vfp2 -target-feature +vfp2sp -target-feature +vfp3 -target-feature +vfp3d16 -target-feature +vfp3d16sp -target-feature +vfp3sp -target-feature -fp16 -target-feature -vfp4 -target-feature -vfp4d16 -target-feature -vfp4d16sp -target-feature -vfp4sp -target-feature -fp-armv8 -target-feature -fp-armv8d16 -target-feature -fp-armv8d16sp -target-feature -fp-armv8sp -target-feature -fullfp16 -target-feature +fp64 -target-feature +d32 -target-feature +neon -target-feature -sha2 -target-feature -aes -target-feature -fp16fml -target-abi aapcs-linux -mfloat-abi soft -debugger-tuning=gdb -gsimple-template-names=simple -debug-forward-template-params -ffunction-sections -fdata-sections -fno-unique-section-names -O2 -std=c++20 -fdeprecated-macro -fvisibility=hidden -fvisibility-inlines-hidden -femulated-tls -stack-protector 1 -ftrivial-auto-var-init=pattern -fno-rtti -fno-signed-char -fgnuc-version=4.2.1 -fno-implicit-modules -fskip-odr-check-in-gmf -Qn -vectorize-loops -vectorize-slp -mllvm -split-threshold-for-reg-with-hint=0 -fcomplete-member-pointers -o /tmp/a.o /tmp/formatutilsgl.ii -w -fprofile-instrument=clang -fcoverage-mapping -mllvm -limited-coverage-experimental=true -mllvm -runtime-counter-relocation=true real 8m15.434s user 8m15.066s sys 0m0.313s ``` (This uses Clang built at a0b3dbaf4b3c01dc7f0a83fce059a26360b58eb2)
chapuni commented 2 months ago

Also x86 has similar issue. Twiddling counters from many cores in the hot loop will take us (and the processor) to the hell.

For coverage, I suggest using bitmap instead of counters. https://discourse.llvm.org/t/rfc-region-branch-coverage-by-bitmap/79629

Re. single byte counters, I guess it wouldn't work with -fcoverage-mcdc. As you know, it has the performance issue as well. https://discourse.llvm.org/t/rfc-single-byte-counters-for-source-based-code-coverage/75685

llvmbot commented 2 months ago

@llvm/issue-subscribers-backend-x86

Author: Hans (zmodem)

Attached is a reproducer from Chromium: [formatutilsgl.ii.gz](https://github.com/llvm/llvm-project/files/14448933/formatutilsgl.ii.gz) Without runtime runtime counter relocation it compiles in 14 s: ``` $ time clang++ -cc1 -triple thumbv7-unknown-linux-android26 -emit-obj -mrelocation-model pic -pic-level 2 -fhalf-no-semantic-interposition -fmerge-all-constants -fno-delete-null-pointer-checks -mframe-pointer=none -relaxed-aliasing -ffp-contract=off -fno-rounding-math -mconstructor-aliases -funwind-tables=1 -target-cpu generic -target-feature +soft-float-abi -target-feature +vfp2 -target-feature +vfp2sp -target-feature +vfp3 -target-feature +vfp3d16 -target-feature +vfp3d16sp -target-feature +vfp3sp -target-feature -fp16 -target-feature -vfp4 -target-feature -vfp4d16 -target-feature -vfp4d16sp -target-feature -vfp4sp -target-feature -fp-armv8 -target-feature -fp-armv8d16 -target-feature -fp-armv8d16sp -target-feature -fp-armv8sp -target-feature -fullfp16 -target-feature +fp64 -target-feature +d32 -target-feature +neon -target-feature -sha2 -target-feature -aes -target-feature -fp16fml -target-abi aapcs-linux -mfloat-abi soft -debugger-tuning=gdb -gsimple-template-names=simple -debug-forward-template-params -ffunction-sections -fdata-sections -fno-unique-section-names -O2 -std=c++20 -fdeprecated-macro -fvisibility=hidden -fvisibility-inlines-hidden -femulated-tls -stack-protector 1 -ftrivial-auto-var-init=pattern -fno-rtti -fno-signed-char -fgnuc-version=4.2.1 -fno-implicit-modules -fskip-odr-check-in-gmf -Qn -vectorize-loops -vectorize-slp -mllvm -split-threshold-for-reg-with-hint=0 -fcomplete-member-pointers -o /tmp/a.o /tmp/formatutilsgl.ii -w -fprofile-instrument=clang -fcoverage-mapping -mllvm -limited-coverage-experimental=true real 0m14.071s user 0m13.846s sys 0m0.225s ``` With runtime counter relocation it takes 8 minutes: ``` $ time clang++ -cc1 -triple thumbv7-unknown-linux-android26 -emit-obj -mrelocation-model pic -pic-level 2 -fhalf-no-semantic-interposition -fmerge-all-constants -fno-delete-null-pointer-checks -mframe-pointer=none -relaxed-aliasing -ffp-contract=off -fno-rounding-math -mconstructor-aliases -funwind-tables=1 -target-cpu generic -target-feature +soft-float-abi -target-feature +vfp2 -target-feature +vfp2sp -target-feature +vfp3 -target-feature +vfp3d16 -target-feature +vfp3d16sp -target-feature +vfp3sp -target-feature -fp16 -target-feature -vfp4 -target-feature -vfp4d16 -target-feature -vfp4d16sp -target-feature -vfp4sp -target-feature -fp-armv8 -target-feature -fp-armv8d16 -target-feature -fp-armv8d16sp -target-feature -fp-armv8sp -target-feature -fullfp16 -target-feature +fp64 -target-feature +d32 -target-feature +neon -target-feature -sha2 -target-feature -aes -target-feature -fp16fml -target-abi aapcs-linux -mfloat-abi soft -debugger-tuning=gdb -gsimple-template-names=simple -debug-forward-template-params -ffunction-sections -fdata-sections -fno-unique-section-names -O2 -std=c++20 -fdeprecated-macro -fvisibility=hidden -fvisibility-inlines-hidden -femulated-tls -stack-protector 1 -ftrivial-auto-var-init=pattern -fno-rtti -fno-signed-char -fgnuc-version=4.2.1 -fno-implicit-modules -fskip-odr-check-in-gmf -Qn -vectorize-loops -vectorize-slp -mllvm -split-threshold-for-reg-with-hint=0 -fcomplete-member-pointers -o /tmp/a.o /tmp/formatutilsgl.ii -w -fprofile-instrument=clang -fcoverage-mapping -mllvm -limited-coverage-experimental=true -mllvm -runtime-counter-relocation=true real 8m15.434s user 8m15.066s sys 0m0.313s ``` (This uses Clang built at a0b3dbaf4b3c01dc7f0a83fce059a26360b58eb2)
chapuni commented 1 month ago

Sorry, I misunderstood this were the issue in coverage-instrumented binaries. I've removed X86.

I didn't reproduce this since I cannot set up thumb environment.