Description of the Issue

I ran an extremely simple program (6 asm instructions) through MCA Daemon and it reported instruction count much higher (in the hundreds of thousands). Below is a detailed summary of the issue and how to reproduce.

I first started with the simplest C program I could think of:

demo.c:

int main() { return 0; }

I compiled this down to binary:

$ clang demo.c -S -o demo.s -target x86_64-unknown-linux-gnu
$ clang demo.s -o demo.o

The contents of demo.s are:

.text
.file "demo.c"
.globl main                            # -- Begin function main
.p2align 4, 0x90
.type main,@function
main:                                   # @main
.cfi_startproc
# %bb.0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
movl $0, -4(%rbp)
xorl %eax, %eax
popq %rbp
.cfi_def_cfa %rsp, 8
retq
.Lfunc_end0:
.size main, .Lfunc_end0-main
.cfi_endproc
                                        # -- End function
.ident "clang version 12.0.1"
.section ".note.GNU-stack","",@progbits
.addrsig

Note that there are 6 instructions in this program. I've even tried removing all the extra info besides the main label and the 6 asm instructions but get the same results. This number of instructions be relevant below.

Now I start the llvm-mcad server with the qemu-broker:

$ ./llvm-mcad -mtriple="x86_64-unknown-linux-gnu" -mcpu="skylake" \
            --load-broker-plugin=$PWD/plugins/qemu-broker/libMCADQemuBroker.so \
            -broker-plugin-arg-host="localhost:9487" &

And next I run qemu with the broker:

$ ~/repos/qemu/build/qemu-x86_64 -plugin ~/repos/LLVM-MCA-Daemon/.build/plugins/qemu-broker/Qemu/libQemuRelay.so,arg="-addr=127.0.0.1",arg="-port=9487" -d plugin demo.o

Now I get the output from qemu:

note: Connected to [127.0.0.1:9487](http://127.0.0.1:9487/)

which terminates after giving this connected message and then I get the output from llvm-mcad:

Instructions:      126304
Total Cycles:      56050
Total uOps:        144559

Dispatch Width:    6
uOps Per Cycle:    2.58
IPC:               2.25
Block RThroughput: 24093.2
Cleaning up worker thread...

[1]  + 160038 done       ./llvm-mcad -mtriple="x86_64-unknown-linux-gnu" -mcpu="skylake"

I stopped here and said 126304 instructions does not sound correct to me.

So what I did was try with the asm broker:

./llvm-mcad -mtriple="x86_64-unknown-linux-gnu" -mcpu="skylake" --broker=asm < demo.s

which gave and output of:

=== Printing report for Region [0] ===
Iterations:        1
Instructions:      6
Total Cycles:      11
Total uOps:        8

Dispatch Width:    6
uOps Per Cycle:    0.73
IPC:               0.55
Block RThroughput: 2.0

This to me looks correct according to the demo.s file above. I am curious what is causing the qemu broker to report the incorrect information.

Then I learned there was the -only-main-code option which:

only sends instructions that are belong to the main executable. This flag can get rid of unrelated execution traces, like those generated from interpreter (i.e. ld.so). But this might also get rid of shared library loaded during run-time.

However if we run it with this option as specified in the readme:

qemu-x86_64 -plugin ~/repos/LLVM-MCA-Daemon/.build/plugins/qemu-broker/Qemu/libQemuRelay.so,arg="-only-main-code",arg="-addr=127.0.0.1",arg="-port=9487",arg="-debug" -d plugin demo.s

nothing changes about the analysis. We are still seeing the hundreds of thousands of instructions.

I conclude now that the -only-main-code option does not work as expected

Solution

The current way that we are checking whether to ignore an instruction is specified by the following condition:

if (OnlyMainCode && VAddr < *CodeStartAddr)

There are two problems with this:

CodeStartAddr is set from the qemu_plugin_vcpu_code_start_vaddr function which is returning the code_offset. In my case, code_offset is always zero. I think we should be using start_code instead. This commit changes from start_code to code_offset. Do you have any explanation on why this was required. code_offset is suspiciously 0 for me and start_code sounds like what we're after.
We are not considering any instructions that exist past the text segment of the executable we pass into qemu in the other direction. That is why I add changes to ignore instructions with address after end_code.

So the new change looks as follows:

if (OnlyMainCode &&
        (VAddr < *CodeStartAddr || VAddr > *CodeEndAddr))

*Please note that `VAddr < CodeStartAddrnever evaluates to true (all the extra instructions end afterCodeEndAddr`), but I include it here for robustness.**

With these changes in place I get the following analysis:

~/repos/qemu/build/qemu-x86_64 -plugin ~/repos/LLVM-MCA-Daemon/.build/plugins/qemu-broker/Qemu/libQemuRelay.so,arg="-only-main-code",arg="-addr=127.0.0.1",arg="-port=9487",arg="-debug" -d plugin demo.o
Args: MCADRelay -only-main-code -addr=127.0.0.1 -port=9487 -debug
Using QEMU target x86_64
note: Connected to 127.0.0.1:9487
Code start address: 0x00000000400000
Code end address: 0x00000000400608
Total number of executed instructions: 87
Iterations:        1
Instructions:      87
Total Cycles:      122
Total uOps:        106

Dispatch Width:    6
uOps Per Cycle:    0.87
IPC:               0.71
Block RThroughput: 17.7
Cleaning up worker thread...

I am happy that the number of instructions has dropped to 87, but am concerned that the number of instructions is still not 6.

Discussion

To the reviewers, do you have any opinion on why we may be seeing 87 instructions instead of 6? Is the solution correct but somewhere there is a loss of precision?

securesystemslab / LLVM-MCA-Daemon

fix -only-main-code #1

Description of the Issue

Solution

Discussion