harvard-acc / LLVM-Tracer

An LLVM pass to profile dynamic LLVM IR instructions and runtime values
Other
135 stars 35 forks source link

Dynamic trace #9

Closed andrestoga closed 8 years ago

andrestoga commented 8 years ago

Hi Sophia,

I checked the generate trace for the triad example program and I didn't understand it. I was expecting something like a LLVM IR or like a golden trace like this one:

https://github.com/DependableSystemsLab/LLFI/wiki/Generate-analyse-traces

with opcodes. I'm not an expert in LLVM. I'd appreciate any hint or help in order to understand it.

Thank you very much!

Kind regards,

ysshao commented 8 years ago

Hi Andres,

Sorry forget this question here. After you run the triad program, you should have these two files: triad.llvm and dynamic_trace.gz. triad.llvm is the clang compiled assembly in LLVM IR. You can find similar opcode names like getelementptr and load in triad.llvm as in the golden trace example. The output of LLVM-Tracer is the dynamic_trace.gz file, which gives you a dynamic trace of the executed binary. It's encoded in the following format:

  1. Each block of lines represent the dynamic information of a dynamic instruction.
  2. Each block always starts with a line with a "0" in the beginning. This line contains instruction information in the format of: 0,line-number,function-name,basic-block-id,inst-id,opcode,dynamic-inst-id

For example, the first line of triad is: 0,5,triad,0,0-5,2,0 which means, for the first instruction of triad (you can find it in the triad.llvm file, the first instruction not including the debugging instructions is "br label %1, !dbg !86") a) it maps to line 5 in the source code b) its function name is "triad" c) its basic block id is 0 d) its static instid is 0-5 (here we just want to have an unique identifier of each static instruction, so it's a string here) e) its opcode is 2, which is br in LLVM IR. f) its dynamic instruction is 0, as it's the first instruction in the dynamic trace.

  1. The following lines are information for each argument of the instruction, as well as result. The first column represents which argument it is. For example, '1' means this line shows information of the first argument of the instruction, while 'r' represents the result.

The format of these lines are: argument-id,size-of-argument,dynamic-value,is-register-or-not,if-register-register-name

For the first instruction in the triad example, it only has one argument, so the line of interest is: 1,0,0,1,1,

It means a) It's the first argument of the instruction b) Its size is 0 c) its dynamic value is 0 (Label doesn't have dynamic value) d) it's a register e) its register id is 1.

All the lines in the dynamic_trace.gz file are encoded in this way. I actually didn't realize that we don't have an explanation on the format of the trace. Thanks for the question! I'll put a more informative description of the trace format later.

Let me know whether this is clear to you.

Thanks, Sophia

andrestoga commented 8 years ago

Hi Sophia,

Thank you very much for the detail explanation! I have some questions:

*Sometimes it is very common to find a -1 in the line-number. Like for example in this one:

0,-1,triad,1,indvars.iv,48,61 2,64,0,1,indvars.iv.next,1, 1,64,0,0, ,0, r,64,5,1,indvars.iv,

Do you know what does it mean?

*In the example that you explained, you say that the opcode 2 in LLVM IR is br. Where I can find the decoding for the other opcodes? For example, what does the opcode 29 mean?

Again, thank you very much for your help!

Best,

zhguanw commented 8 years ago

Hi Andres,

"-1" means you can not find this instruction in your source code, for example LLVM PHI instruction. Line information is extracted from debug messages in LLVM IR

There is a file related to the opcodes, you can search "LLVM_IR_Br" to locate the file. "29" may be store operation.

@Sophia, I drag and modify part (some optimizations) of Aladdin for my FPGA project also using DDDG, thanks for sharing the code. :-)

Best regards, Guanwen (Henry)

On Tuesday, November 17, 2015, Andres Torres Garcia < notifications@github.com> wrote:

Hi Sophia,

Thank you very much for the detail explanation! I have some questions:

*Sometimes it is very common to find a -1 in the line-number. Like for example in this one:

0,-1,triad,1,indvars.iv,48,61 2,64,0,1,indvars.iv.next,1, 1,64,0,0, ,0, r,64,5,1,indvars.iv,

Do you know what does it mean?

*In the example that you explained, you say that the opcode 2 in LLVM IR is br. Where I can find the decoding for the other opcodes? For example, what does the opcode 29 mean?

Again, thank you very much for your help!

Best,

— Reply to this email directly or view it on GitHub https://github.com/ysshao/LLVM-Tracer/issues/9#issuecomment-157236328.

zhguanw commented 8 years ago

Sorry Andres, I just double check the source code. Opcode "29" is GetElementPtr instruction, "28" is store operation. File is "common/opcode_func.h". "grep" is a good helper for you to learn new things. :-)

On Tue, Nov 17, 2015 at 10:16 AM, Guanwen Zhong guanwen@comp.nus.edu.sg wrote:

Hi Andres,

"-1" means you can not find this instruction in your source code, for example LLVM PHI instruction. Line information is extracted from debug messages in LLVM IR

There is a file related to the opcodes, you can search "LLVM_IR_Br" to locate the file. "29" may be store operation.

@Sophia, I drag and modify part (some optimizations) of Aladdin for my FPGA project also using DDDG, thanks for sharing the code. :-)

Best regards, Guanwen (Henry)

On Tuesday, November 17, 2015, Andres Torres Garcia < notifications@github.com> wrote:

Hi Sophia,

Thank you very much for the detail explanation! I have some questions:

*Sometimes it is very common to find a -1 in the line-number. Like for example in this one:

0,-1,triad,1,indvars.iv,48,61 2,64,0,1,indvars.iv.next,1, 1,64,0,0, ,0, r,64,5,1,indvars.iv,

Do you know what does it mean?

*In the example that you explained, you say that the opcode 2 in LLVM IR is br. Where I can find the decoding for the other opcodes? For example, what does the opcode 29 mean?

Again, thank you very much for your help!

Best,

— Reply to this email directly or view it on GitHub https://github.com/ysshao/LLVM-Tracer/issues/9#issuecomment-157236328.

ysshao commented 8 years ago

Thanks @zhguanw for the answers! And glad to know Aladdin and LLVM-Tracer are useful for your project! =D

@andrestoga The opcode_func.h file @zhguanw mentioned is in Aladdin project, you can find the file here:

https://github.com/ysshao/ALADDIN/blob/master/common/opcode_func.h

It gives you a list of mapping between LLVM Opcode to the unique microop ID in the trace.

andrestoga commented 8 years ago

Thank you very much @ysshao and @zhguanw! I'm just using LLVM-Tracer and not Aladdin project at the moment, that's why I didn't find the file using grep.

Best,

ysshao commented 8 years ago

Great!

andrestoga commented 8 years ago

Hi Sophia,

Sorry for posting again in an issue closed but in the explanation you gave me above, you told me that there were only five fields per line in the arguments lines. In this example (and also in many others), the first line of the line argument has 6 fields:

0,-1,triad,1,indvars.iv,48,61 2,64,0,1,indvars.iv.next,1, <-------------------------- 1,64,0,0, ,0, r,64,5,1,indvars.iv,

Do you know what the six field mean?

Best,

ysshao commented 8 years ago

hi @andrestoga , thanks for the patience. I have been traveling in the past month. On your question, when there are 6 fields, it's used for LLVM Phi node to print out the incoming basic block id for each argument. It is supposed to print out whether it's a phi node and the corresponding incoming basic block id.

On that note, the line you showed here actually does not seem to do what it is supposed to be doing. Let me double check with that and get back to you later. Thanks for pointing this out!

Sophia