tum-ei-eda / etiss

Extendable Translating Instruction Set Simulator
https://tum-ei-eda.github.io/etiss/
Other
29 stars 36 forks source link

Uninitialized Nodes in Instruction Set Tree #74

Open fpedd opened 3 years ago

fpedd commented 3 years ago

When uncommenting https://github.com/tum-ei-eda/etiss/blob/4c3631391c81ef49d292e030c09ffd085cb3c70c/ArchImpl/RISCV/RISCVArchSpecificImp.h#L386 the tree structure of the instruction set/instruction decoder gets printed.

However, some nodes in the compressed instruction set tree are printed as "uninitialized" (arrows <----- inserted by me):

...
MODE 1:     ISA16_RISCV[default: 16]:
        ISA16_RISCV[16]:
            @0x0 Node[1:0]
                @0x0 Node[15:13]
                    @0x0 Uninitialized Node <-----
                    @0x1 Instruction: c.fld
                    @0x2 Instruction: c.lw
                    @0x3 Instruction: c.flw
                    @0x5 Instruction: c.fsd
                    @0x6 Instruction: c.sw
                    @0x7 Instruction: c.fsw
                @0x1 Node[15:13]
                    @0x0 Uninitialized Node <-----
                    @0x1 Instruction: c.jal
                    @0x2 Instruction: c.li
                    @0x3 Uninitialized Node <-----
                    @0x4 Node[11:10]
                        @0x0 Node[12:12]
                            @0x0 Instruction: c.srli
                        @0x1 Node[12:12]
                            @0x0 Instruction: c.srai
                        @0x2 Instruction: c.andi
                        @0x3 Node[6:5]
                            @0x0 Node[12:12]
                                @0x0 Instruction: c.sub
                            @0x1 Node[12:12]
                                @0x0 Instruction: c.xor
                            @0x2 Node[12:12]
                                @0x0 Instruction: c.or
                            @0x3 Node[12:12]
                                @0x0 Instruction: c.and
                    @0x5 Instruction: c.j
                    @0x6 Instruction: c.beqz
                    @0x7 Instruction: c.bnez
                @0x2 Node[15:13]
                    @0x0 Node[12:12]
                        @0x0 Instruction: c.slli
                    @0x1 Instruction: c.fldsp
                    @0x2 Instruction: c.lwsp
                    @0x3 Instruction: c.flwsp
                    @0x4 Node[12:12]
                        @0x0 Uninitialized Node <-----
                        @0x1 Uninitialized Node <-----
                    @0x5 Instruction: c.fsdsp
                    @0x6 Instruction: c.swsp
                    @0x7 Instruction: c.fswsp
...

A node gets printed as uninitialized when this condition evaluates to false: https://github.com/tum-ei-eda/etiss/blob/4c3631391c81ef49d292e030c09ffd085cb3c70c/src/Instruction.cpp#L739

The second uninitialized node

@0x1 Node[15:13]
    @0x0 Uninitialized Node <-----

corresponds to the c.addi instruction. Checking the binary I am compiling (using an rv32gc compiler) this instruction gets used multiple times. I would thus expect the binary to throw some sort of error. However, the binary using these "uninitialized instructions" runs without any issues.

What is happening here? Why are those nodes printed as uninitialized? Why does the binary run anyways? Any help is appreciated! :)

PS: I am mainly asking because I am working on something else, where some instructions/nodes of a RISC-V instruction set extension are also printed as "uninitialized". However, those uninitialized instructions cause some trouble and I am trying to understand why and where the underlying issue is.

wysiwyng commented 3 years ago

Please check whether this instruction is actually executed, I am pretty confident it is (otherwise ETISS would complain, as you already noted). You can do that i.e. by using the PrintInstruction plugin, or placing a breakpoint somewhere here: https://github.com/tum-ei-eda/etiss/blob/91fa3b3c7029241173380f99f8547b7ebef8b5cd/ArchImpl/RISCV/RISCVArch.cpp#L12140 and running ETISS with a debugger.

The instruction tree printing stuff has some issues, but usually these don't mean the decoder is not working. @rafzi might know more as to why the instruction tree prints do not work as expected.

fpedd commented 3 years ago

Providing some more Infos:

Compiling the following main.c with an rv32gcv toolchain:

#include <stdlib.h>
#include <stdio.h>

int main()
{
    asm("addi a1, a1, 1");
    asm("c.addi a1, 1");
    printf("hello world!\n");
}

and dumping the binary with riscv32-unknown-elf-objdump -h -S riscv_example.elf > riscv_example.lst gives:

0000008c <main>:
#include <stdlib.h>
#include <stdio.h>

int main()
{
      8c:   1141                    addi    sp,sp,-16
      8e:   c606                    sw  ra,12(sp)
      90:   c422                    sw  s0,8(sp)
      92:   0800                    addi    s0,sp,16
    asm("addi a1, a1, 1");
      94:   0585                    addi    a1,a1,1
    asm("c.addi a1, 1");
      96:   0585                    addi    a1,a1,1
    printf("hello world!\n");
      98:   67b1                    lui a5,0xc
      9a:   e6878513            addi    a0,a5,-408 # be68 <__DTOR_END__+0x1a>
      9e:   135010ef            jal ra,19d2 <puts>
      a2:   4781                    li  a5,0
}
      a4:   853e                    mv  a0,a5
      a6:   40b2                    lw  ra,12(sp)
      a8:   4422                    lw  s0,8(sp)
      aa:   0141                    addi    sp,sp,16
      ac:   8082                    ret

One can see how most of the instructions are 16bit/compressed instructions (for some reason the human-readable instructions are not shown as compressed instructions). Because the assembler is responsible for converting "normal" instructions to compressed instructions (of course only when compressed support is available), also the addi inline assembly instruction gets converted to its compressed equivalent (address 0x94).

Running this with the PrintInstruction plugin enabled gives:

...
0x000000000000008c: c.addi # 0x0x1141 [UNKNOWN PARAMETERS]
0x000000000000008e: c.swsp # 0x0xc606 [UNKNOWN PARAMETERS]
0x0000000000000090: c.swsp # 0x0xc422 [UNKNOWN PARAMETERS]
0x0000000000000092: c.addi4spn # 0x0x0800 [UNKNOWN PARAMETERS]
0x0000000000000094: c.addi # 0x0x0585 [UNKNOWN PARAMETERS]
0x0000000000000096: c.addi # 0x0x0585 [UNKNOWN PARAMETERS]
0x0000000000000098: c.lui # 0x0x67b1 [UNKNOWN PARAMETERS]
0x000000000000009a: addi # 0x0xe6878513 [UNKNOWN PARAMETERS]
0x000000000000009e: jal # 0x0x135010ef [UNKNOWN PARAMETERS]
...

I also set a breakpoint using the target gdb in at one of the inline addi instructions and checked the dereferenced instruction pointer, which supports the claim that indeed a "compressed add immediate" is executed:

(gdb) x $pc
0x94 <main+8>: 0x05850585

With the CoreDSL for c.addi instruction:

C.ADDI {
encoding:b000 | imm[5:5]s | rs1[4:0] | imm[4:0]s | b01;
args_disass: "{name(rs1)}, {imm:#05x}";
X[rs1] <= X[rs1]'s + imm;
}

the 0x0585 -> 0b 0000 0101 1000 0101 -> 0b 000 0 01011 0001 01 matches the c.addi instruction with register a1 -> x11 -> 0b01011 and 1 as immediate value.

So I am fairly certain that a "compressed add immediate" is executed.

Coming back to the instruction tree and using the encoding from above b000 | imm[5:5]s | rs1[4:0] | imm[4:0]s | b01

    @0x0 Node[1:0]
        @0x0 Node[15:13]
            @0x0 Uninitialized Node
            @0x1 Instruction: c.fld
            @0x2 Instruction: c.lw
            @0x3 Instruction: c.flw
            @0x5 Instruction: c.fsd
            @0x6 Instruction: c.sw
            @0x7 Instruction: c.fsw
        @0x1 Node[15:13]
            @0x0 Uninitialized Node <-----

the c.addi however seems to be uninitialized.