I ran an extremely simple program (6 asm instructions) through MCA Daemon and it reported instruction count much higher (in the hundreds of thousands). Below is a detailed summary of the issue and how to reproduce.
I first started with the simplest C program I could think of:
.text
.file "demo.c"
.globl main # -- Begin function main
.p2align 4, 0x90
.type main,@function
main: # @main
.cfi_startproc
# %bb.0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
movl $0, -4(%rbp)
xorl %eax, %eax
popq %rbp
.cfi_def_cfa %rsp, 8
retq
.Lfunc_end0:
.size main, .Lfunc_end0-main
.cfi_endproc
# -- End function
.ident "clang version 12.0.1"
.section ".note.GNU-stack","",@progbits
.addrsig
Note that there are 6 instructions in this program. I've even tried removing all the extra info besides the main label and the 6 asm instructions but get the same results. This number of instructions be relevant below.
Now I start the llvm-mcad server with the qemu-broker:
=== Printing report for Region [0] ===
Iterations: 1
Instructions: 6
Total Cycles: 11
Total uOps: 8
Dispatch Width: 6
uOps Per Cycle: 0.73
IPC: 0.55
Block RThroughput: 2.0
This to me looks correct according to the demo.s file above. I am curious what is causing the qemu broker to report the incorrect information.
Then I learned there was the -only-main-code option which:
only sends instructions that are belong to the main executable. This flag can get rid of unrelated execution traces, like those generated from interpreter (i.e. ld.so). But this might also get rid of shared library loaded during run-time.
However if we run it with this option as specified in the readme:
nothing changes about the analysis. We are still seeing the hundreds of thousands of instructions.
I conclude now that the -only-main-code option does not work as expected
Solution
The current way that we are checking whether to ignore an instruction is specified by the following condition:
if (OnlyMainCode && VAddr < *CodeStartAddr)
There are two problems with this:
CodeStartAddr is set from the qemu_plugin_vcpu_code_start_vaddr function which is returning the code_offset. In my case, code_offset is always zero. I think we should be using start_code instead. This commit changes from start_code to code_offset. Do you have any explanation on why this was required. code_offset is suspiciously 0 for me and start_code sounds like what we're after.
We are not considering any instructions that exist past the text segment of the executable we pass into qemu in the other direction. That is why I add changes to ignore instructions with address after end_code.
So the new change looks as follows:
if (OnlyMainCode &&
(VAddr < *CodeStartAddr || VAddr > *CodeEndAddr))
*Please note that `VAddr < CodeStartAddrnever evaluates to true (all the extra instructions end afterCodeEndAddr`), but I include it here for robustness.**
With these changes in place I get the following analysis:
~/repos/qemu/build/qemu-x86_64 -plugin ~/repos/LLVM-MCA-Daemon/.build/plugins/qemu-broker/Qemu/libQemuRelay.so,arg="-only-main-code",arg="-addr=127.0.0.1",arg="-port=9487",arg="-debug" -d plugin demo.o
Args: MCADRelay -only-main-code -addr=127.0.0.1 -port=9487 -debug
Using QEMU target x86_64
note: Connected to 127.0.0.1:9487
Code start address: 0x00000000400000
Code end address: 0x00000000400608
Total number of executed instructions: 87
Iterations: 1
Instructions: 87
Total Cycles: 122
Total uOps: 106
Dispatch Width: 6
uOps Per Cycle: 0.87
IPC: 0.71
Block RThroughput: 17.7
Cleaning up worker thread...
I am happy that the number of instructions has dropped to 87, but am concerned that the number of instructions is still not 6.
Discussion
To the reviewers, do you have any opinion on why we may be seeing 87 instructions instead of 6? Is the solution correct but somewhere there is a loss of precision?
Description of the Issue
I ran an extremely simple program (6 asm instructions) through MCA Daemon and it reported instruction count much higher (in the hundreds of thousands). Below is a detailed summary of the issue and how to reproduce.
I first started with the simplest C program I could think of:
demo.c:
I compiled this down to binary:
The contents of
demo.s
are:Note that there are 6 instructions in this program. I've even tried removing all the extra info besides the main label and the 6 asm instructions but get the same results. This number of instructions be relevant below.
Now I start the
llvm-mcad
server with theqemu-broker
:And next I run qemu with the broker:
Now I get the output from qemu:
which terminates after giving this connected message and then I get the output from
llvm-mcad
:I stopped here and said 126304 instructions does not sound correct to me.
So what I did was try with the asm broker:
which gave and output of:
This to me looks correct according to the
demo.s
file above. I am curious what is causing the qemu broker to report the incorrect information.Then I learned there was the
-only-main-code
option which:However if we run it with this option as specified in the readme:
nothing changes about the analysis. We are still seeing the hundreds of thousands of instructions.
I conclude now that the
-only-main-code
option does not work as expectedSolution
The current way that we are checking whether to ignore an instruction is specified by the following condition:
There are two problems with this:
CodeStartAddr
is set from theqemu_plugin_vcpu_code_start_vaddr
function which is returning thecode_offset
. In my case,code_offset
is always zero. I think we should be usingstart_code
instead. This commit changes fromstart_code
tocode_offset
. Do you have any explanation on why this was required.code_offset
is suspiciously 0 for me andstart_code
sounds like what we're after.We are not considering any instructions that exist past the text segment of the executable we pass into qemu in the other direction. That is why I add changes to ignore instructions with address after
end_code
.So the new change looks as follows:
*Please note that `VAddr < CodeStartAddr
never evaluates to true (all the extra instructions end after
CodeEndAddr`), but I include it here for robustness.**With these changes in place I get the following analysis:
I am happy that the number of instructions has dropped to 87, but am concerned that the number of instructions is still not 6.
Discussion
To the reviewers, do you have any opinion on why we may be seeing 87 instructions instead of 6? Is the solution correct but somewhere there is a loss of precision?