riscv-collab / riscv-gnu-toolchain

GNU toolchain for RISC-V, including GCC
Other
3.54k stars 1.17k forks source link

Adding custom instruction to the elf tool-chain #837

Closed subhajit26 closed 2 years ago

subhajit26 commented 3 years ago

Hi, I am trying to get familiar with the whole flow of adding custom instruction to the elf tool-chain and I am planning to implement a mod instruction since it is easily accessible over the internet. Since tool-chain building is very time consuming process I need to confirm certain things before I go ahead with custom instruction flow. First, there are 2 riscv-opc.c and riscv-opc.h files in riscv-gdb and riscv-binutils folder. Do I need to update both of them in order to get the custom instruction implemented successfully. Second, do I need to run this command "./configure --prefix=/opt/riscv make" again after adding the custom instruction because while building for the first time I could see that as soon as make is executed the tool-chain started cloning and building. So my doubt is that if I start building again will the opc files contain those updated information . Please suggest. Please also suggest do I need to make any modification in the Makefile regarding the addition of custom instruction.

Nelson1225 commented 3 years ago

You can check this issue at first, this is helpful, https://github.com/riscv/riscv-gnu-toolchain/issues/782

Since tool-chain building is very time consuming process I need to confirm certain things before I go ahead with custom instruction flow.

If you want to add the instructions in gcc, then it's more complicate. If you just want to add the instructions in binutils, then only build the binutils is enough, you don't need to build the entire toolchain.

First, there are 2 riscv-opc.c and riscv-opc.h files in riscv-gdb and riscv-binutils folder. Do I need to update both of them in order to get the custom instruction implemented successfully.

If you want to disassemble the instructions by gdb, then yes, you should update both of them. If you need new operands or relocation for your custom instructions, then you need to update more files. Otherwise, just update the files you mentioned should be enough.

Second, do I need to run this command "./configure --prefix=/opt/riscv make" again after adding the custom instruction because while building for the first time I could see that as soon as make is executed the tool-chain started cloning and building. So my doubt is that if I start building again will the opc files contain those updated information.

Generally, no need to configure again. But it is better to run "make clean" in your build folder before running "make". As the previous said, if you don't need to add the custom instructions in gcc, then you can just rebuild the binutils/gdb.

cd /build-[binutils|gdb]-[newlib|linux]/ make && make install

subhajit26 commented 3 years ago

Hi Nelson , Thank you very much for this useful information. I am gonna try to implement all the approaches that you have mentioned.

subhajit26 commented 3 years ago

Hi, In order to add the mod instruction in gcc I added this 2 line #define MATCH_MOD 0x200006b #define MASK_MOD 0xfe00707f and DECLARE_INSN(mod, MATCH_MOD, MASK_MOD) to the riscv-opc.h file in both binutils and gdb directory. for riscv-opc.c files in both the directories I added {"mod", "I", "d,s,t", MATCH_MOD, MASK_MOD, match_opcode, 0 } somewhere in between in "const struct riscv_opcode riscv_opcodes[] = ".
Then I did make clean and then Make. After the build is done I tried to compile a file but it shows me the error below: internal error: can't hash `sll': exists Assembler messages: Fatal error: Broken assembler. No assembly attempted. Please suggest how to fix this issue because I am not being able to understand that where am I going wrong.

subhajit26 commented 3 years ago

Hi, I fixed the error and now it works . I got the mod instruction in the dump file. I believe the error was because of putting the mod instruction between the subsets in the riscv-opc.c file although I am not sure but somehow I figured it out from the word sll in the error.

subhajit26 commented 3 years ago

Hi, I am trying to add the functionality of the custom instruction mod in the below code. But the problem the issue is that in the dump file mod is coming only in the place of inline assembly instruction but not for the "c = a % b" which makes me realize that the instruction has been added bit the functionality is not implemented rather for "c = a % b" rem instruction is coming which is already added beforehand. Can you please tell me where am I going wrong or it will be helpful if you can point me to the file where the functionality needs to be implemented.

 `#include <stdio.h>
int main(){
  int a,b,c;
  a = 5;
  b = 2;
  asm volatile
  (
    "mod   %[z], %[x], %[y]\n\t"
    : [z] "=r" (c)
    : [x] "r" (a), [y] "r" (b)
  );
  c = a % b
  printf("The modulus is:%d", c);  
 /*
  if ( c != 1 ){
     printf("\n[[FAILED]]\n");
     return -1;
  }

  printf("\n[[PASSED]]\n");
*/
  return 0;
}`

snippets of dump file:
asm volatile
   10122:   fec42783            lw  a5,-20(s0)
   10126:   fe842703            lw  a4,-24(s0)
   1012a:   02e787eb            mod a5,a5,a4
   1012e:   fef42223            sw  a5,-28(s0)
    "mod   %[z], %[x], %[y]\n\t"
    : [z] "=r" (c)
    : [x] "r" (a), [y] "r" (b)
  );

  c = a%b;
   10132:   fec42703            lw  a4,-20(s0)
   10136:   fe842783            lw  a5,-24(s0)
   1013a:   02f767b3            rem a5,a4,a5
   1013e:   fef42223            sw  a5,-28(s0)
  printf("The modulus is:%d", c);  
   10142:   fe442583            lw  a1,-28(s0)
   10146:   67f9                    lui a5,0x1e
   10148:   3d078513            addi    a0,a5,976 # 1e3d0 <__clzsi2+0x3c>
   1014c:   229d                    jal 102b2 <printf>
     return -1;
  }
jim-wilson commented 3 years ago

If you want the compiler to emit new instructions, you have to modify the compiler. You have probably only modified binutils so far.

The compiler work is much more involved than the binutils work, and depends on the complexity of the instruction. For simple instructions, you can just add a pattern to the machine description file. For complex instructions you may need to write a new optimization pass or modify an existing one. For a mod instruction, this can be done with a md file pattern. See the gcc/config/riscv/riscv.md file and search for "DIVISION and REMAINDER". The patterns here use macros "any_div" and "insn" that are defined above. If mod is same as rem, then you can modify the define_code_attr insn and substitute mod for rem. Otherwise you might need to write a new pattern.

You can find docs for this file in gcc/docs/md.texi or read the online internals docs https://gcc.gnu.org/onlinedocs/gccint/Machine-Desc.html#Machine-Desc

subhajit26 commented 3 years ago

Hi Jim , I got your point as I have already gone through the files you have mentioned. In the .md file I could see that in define_code_attr insn section mod functionality is implemented but supposedly with the naming of rem:

(define_code_attr insn [(ashift "sll")
            (ashiftrt "sra")
            (lshiftrt "srl")
            (div "div")
            (mod "rem")
            (udiv "divu")
            (umod "remu")
            (ior "or")
            (xor "xor")
            (and `"and")`
            (plus "add")
            (minus "sub")])

My purpose was to implement custom instruction to reduce the cycle but replacing the mod with rem will not solve my purpose it seems since it will only change the name. In the riscv-opc.h files in both the binutils and gdb folder I also found that there are lot of instruction declaration named custom. Are they already implemented with some functionality because I can see the match and mask bit for them or in other words can I use them for custom instruction purpose if they are unused, because in that way I do not need to define a pattern myself rather I can use the functionality if it solves my purpose. Please suggest.

jim-wilson commented 3 years ago

If you want a custom instruction you have to do a lot of work. There is no easy solution here.

I don't know what the "custom" stuff is for. I think it is some obsolete broken feature. Perhaps a feature that was replaced by the .insn feature. .insn is documented in the gas manual, and allows you to generate custom instruction encodings without modifying the gas sources.

subhajit26 commented 3 years ago

Hi Jim, Thanks a lot for guiding in a right way regarding the actual implementation of custom instruction. Regarding the custom instruction till now I have seen only video from British Computer Society Open Source Specialists named "Adding instruction to GNU assembler" which somewhat resembles what you have mentioned about the gas manual. Along with riscv-opc.h an driscv-opc.c, do I have to make certain modification and additions in riscv-opc.dis , tc-riscv.h, tc-riscv.c. I believe the last 2 files mentioned in the riscv-gdb/gas/config path where I can see there are encode type is mentioned and some pattern is defined. Since, I am absolutely new to this flow I am just worried that I do not break the tool-chain in any way but then I also want to learn the flow of adding custom instruction and make sure that the functionality implemented is bringing certain changes to the code. If there is any resources or any method that you can refer to apart from the document that you have already mentioned above, it will really help me to get the correct flow of adding custom instruction.

jim-wilson commented 3 years ago

I filed https://sourceware.org/bugzilla/show_bug.cgi?id=27348 to get the "custom" stuff fixed. It is obsolete as I expected.

Binutils has testsuites that you can use to check for errors. Try "make check". It is a good idea to add new testcases when adding new features. You can also test binutils by building the compiler and running the compiler testsuite.

How many files you need to modify depends on the complexity of the instruction you are adding. In the simplest case, you only need to modify riscv-opc.h and riscv-opc.c. Basically, write a testcase, and when you can assemble and disassemble it then you are done. You continue making changes until you reach this point.

Nelson1225 commented 3 years ago

I filed https://sourceware.org/bugzilla/show_bug.cgi?id=27348 to get the "custom" stuff fixed. It is obsolete as I expected.

Fixed and committed to mainline, thanks.

Nelson

subhajit26 commented 3 years ago

Hi Jim, I want to implement the absolute instruction for integer number but in the riscv.md file I can see this part for the floating type.

(define_insn "abs<mode>2"
  [(set (match_operand:ANYF           0 "register_operand" "=f")
    (abs:ANYF (match_operand:ANYF 1 "register_operand" " f")))]
  "TARGET_HARD_FLOAT"
  "fabs.<fmt>\t%0,%1"
  [(set_attr "type" "fmove")
   (set_attr "mode" "<UNITMODE>")])

I suppose that the functionality of the absolute has already been implemented. Can you please help me out with in which section of the file should I look to confirm the absolute functionality since I want to implement the functionality for integer.

(define_insn "abs<mode>2"
  [(set (match_operand:ANYI           0 "register_operand" "=r")
    (abs:ANYI (match_operand:ANYI 1 "register_operand" " r")))]

I was trying to define in this way for the integer part after going through the "parameterized name" from the docs that you have mentioned previously but then I am confused about how to write the rest of the portion of the define_insn. Since I am absolutely new to this compiler and tool-chain stuffs and dealing with machine description files for the first time I am getting bit confused on how to proceed with the instructions that are not defined or naming patterns that are not hard coded in compiler. I have one more request: Can you please give me the flow for any custom instruction as an example that needs to build from scratch because I am really lost between the exact number of files that needs to be modified. In the mean time I am going through the machine description document that you have referred and trying to understand the .md file.

subhajit26 commented 3 years ago

Hi Jim, I have tried to write the template for 32 bit hardware supported integer mode as per my understanding of the riscv.md file. Please correct me if I am wrong.

(define_insn "abs<mode>2"
  [(set (match_operand:SI          0 "register_operand" "=r")
    (abs:SI (match_operand:SI 1 "register_operand" " r")))]
  "TARGET_32_BIT"
  "abs.<fmt>\t%0,%1"
  [(set_attr "type" "move")
   (set_attr "mode" "<UNITMODE>")])

I am confused about this line: "abs.\t%0,%1" and set_attr "mode" " . What should be in place of UNITMODE.

LevyHsu commented 3 years ago

mode definition is also in the riscv.md

;; Main data type used by the insn (define_attr "mode" "unknown,none,QI,HI,SI,DI,TI,SF,DF,TF" (const_string "unknown"))

;; This attribute gives the upper-case mode name for one unit of a ;; floating-point mode. (define_mode_attr UNITMODE [(SF "SF") (DF "DF")])

Since both of you op0 and op1 are in SImode, and your target is 32bit, I believe it would be ok to use SI

Some extra info from gccint you probably need: https://gcc.gnu.org/onlinedocs/gccint/Machine-Modes.html

LevyHsu commented 3 years ago

if like you said, you just want an abssi2, then adjust (set_attr "mode" "SI")]), also remove the <fmt>, then it should be fine.

Further more, if you want something like abssi2, absdi2 for SI and DImode for (define_insn "abs<mode>2" Then it doesn't seem to be a great idea to restrict match_operand:SI 0 and match_operand:SI to SImode only. otherwise absdi with DI operand won't go through this pattern. You may need to have a look at define_mode_iteratorfor complex mode, if none of them fits, write your own mode iterator. Also, the "TARGET_32_BIT" can be removed if it needs work for DImode under 64bit. just place it with "", then (set_attr "mode" "<MODE>")])

subhajit26 commented 3 years ago

Hi Levy, Does this look Ok?

(define_insn "abssi2"
  [(set (match_operand:SI          0 "register_operand" "=r")
    (abs:SI (match_operand:SI 1 "register_operand" " r")))]
  "abs\t%0,%1"
  [(set_attr "type" "move")
   (set_attr "mode" "SI")])

Regarding restriction of the of the match operand that you have mentioned , I want the custom instruction only for 32 bit data. That is the reason I have replaced ANYI with SI in particular as was mentioned in the define_mode_iterator .

Now according to my understanding from the overview section of the Machine-Desc document the template given above is inserted into instruction list. So is that all if I want get an absolute functionality implemented in the toolchain or do I need to make any more modifications in this file or any other file like if u can figure out my doubt from the chain mail above. This is my disassembly:

0001010e <main>:
#include <stdio.h>
#include <stdlib.h>
int main()
{
   1010e:   1101                    addi    sp,sp,-32
   10110:   ce06                    sw  ra,28(sp)
   10112:   cc22                    sw  s0,24(sp)
   10114:   1000                    addi    s0,sp,32
   int m = 200;     // m is assigned to 200
   10116:   0c800793            li  a5,200
   1011a:   fef42623            sw  a5,-20(s0)
   int n = -400;    // n is assigned to -400
   1011e:   e7000793            li  a5,-400
   10122:   fef42423            sw  a5,-24(s0)

   int c = abs(m);
   10126:   fec42783            lw  a5,-20(s0)
   1012a:   87fd                    srai    a5,a5,0x1f
   1012c:   fec42703            lw  a4,-20(s0)
   10130:   8f3d                    xor a4,a4,a5
   10132:   40f707b3            sub a5,a4,a5
   10136:   fef42223            sw  a5,-28(s0)
   int d =abs(n);
   1013a:   fe842783            lw  a5,-24(s0)
   1013e:   87fd                    srai    a5,a5,0x1f
   10140:   fe842703            lw  a4,-24(s0)
   10144:   8f3d                    xor a4,a4,a5
   10146:   40f707b3            sub a5,a4,a5
   1014a:   fef42023            sw  a5,-32(s0)

As you see abs function is taking instructions srai, xor and sub. But all I want is that when the compiler sees the abs function in C code it should show abs instead of all the three instructions. So how do I implement the functionality for the compiler to recognize it. Any suggestion will be of great help.

LevyHsu commented 3 years ago

The md file you wrote should work now, but like mentioned by Jim and Nelson, "If you want a custom instruction you have to do a lot of work. There is no easy solution here."

Because C code and assembly code are not that 1 on 1 match (or say, equal), therefore we need a compiler to do a series of "translation".

First, you need to know where a certain function is implemented, for your case, abs() is defined in /glibc(or newlib)/stdlib/abs.c: ` int abs (int i) { return i < 0 ? -i : i; } Then how does this c code translated to AST, GIMPLE, the match the MD template to produce RTL code? maybe dump some info: https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html If still, you find the dump info is not detailed enough, modify gcc to produce the info you need.

Narrowing down this a bit, how do other patterns in the .md file get matched (have a look at optabs.def), then trace it backward.

in your case I believe the most feasible way is to call the inline assembly: https://gcc.gnu.org/onlinedocs/gcc/Using-Assembly-Language-with-C.html#Using-Assembly-Language-with-C Then modify other necessary parts.

And my other suggestion includes using make -jnproc`` to accelerate the compiling process if you really want the hard way. It should take about 5~15 mins to build a new tool chain if you have 10+ cores with enough RAM and a nvme SSD.

Jim says a lot of work and no easy, he means it. :)

AnushaGoel14 commented 3 years ago

Hi, I tried Adding the custom instruction to spike ISA simulator, and have done below given steps:

In the riscv-isa-sim/riscv/encoding.h add the following lines:

define MATCH_MOD 0x200006b

define MASK_MOD 0xfe00707f

... DECLARE_INSN(mod, MATCH_MOD, MASK_MOD)

Create a file riscv-isa-sim/riscv/insns/mod.h and add these lines: WRITE_RD(sext_xlen(RS1 % RS2));

Add this file to riscv-isa-sim/riscv/riscv.mk.in riscv_insn_list = \ ... mod \ ...

In riscv-isa-sim/spike_main/disasm.cc add the following lines: DEFINE_RTYPE(mod);

And then I again build it, but its showing the following error: Building project riscv-isa-sim ../spike_main/disasm.cc: In constructor ‘disassembler_t::disassembler_t(int)’: ../spike_main/disasm.cc:295:38: error: ‘match_mod’ was not declared in this scope add_insn(new disasm_insnt(name, match##code, mask_##code | (extra), __VA_ARGS__)); ^ ../spike_main/disasm.cc:298:30: note: in expansion of macro ‘DISASM_INSN’

define DEFINE_RTYPE(code) DISASM_INSN(#code, code, 0, {&xrd, &xrs1, &xrs2})

                          ^~~~~~~~~~~

../spike_main/disasm.cc:425:3: note: in expansion of macro ‘DEFINE_RTYPE’ DEFINE_RTYPE(mod); ^~~~ ../spike_main/disasm.cc:295:38: note: suggested alternative: ‘match_fsd’ add_insn(new disasm_insnt(name, match##code, mask_##code | (extra), __VA_ARGS__)); ^ ../spike_main/disasm.cc:298:30: note: in expansion of macro ‘DISASM_INSN’

define DEFINE_RTYPE(code) DISASM_INSN(#code, code, 0, {&xrd, &xrs1, &xrs2})

                          ^~~~~~~~~~~

../spike_main/disasm.cc:425:3: note: in expansion of macro ‘DEFINE_RTYPE’ DEFINE_RTYPE(mod); ^~~~ ../spike_main/disasm.cc:295:52: error: ‘mask_mod’ was not declared in this scope add_insn(new disasm_insnt(name, match##code, mask_##code | (extra), __VA_ARGS__)); ^ ../spike_main/disasm.cc:298:30: note: in expansion of macro ‘DISASM_INSN’

define DEFINE_RTYPE(code) DISASM_INSN(#code, code, 0, {&xrd, &xrs1, &xrs2})

                          ^~~~~~~~~~~

../spike_main/disasm.cc:425:3: note: in expansion of macro ‘DEFINE_RTYPE’ DEFINE_RTYPE(mod); ^~~~ ../spike_main/disasm.cc:295:52: note: suggested alternative: ‘mask_fsd’ add_insn(new disasm_insnt(name, match##code, mask_##code | (extra), __VA_ARGS__)); ^ ../spike_main/disasm.cc:298:30: note: in expansion of macro ‘DISASM_INSN’

define DEFINE_RTYPE(code) DISASM_INSN(#code, code, 0, {&xrd, &xrs1, &xrs2})

                          ^~~~~~~~~~~

../spike_main/disasm.cc:425:3: note: in expansion of macro ‘DEFINE_RTYPE’ DEFINE_RTYPE(mod); ^~~~ make: *** [disasm.o] Error 1

Please let me know if am I missing any steps that I need to do?Or anything else that can be done to add new instructions in riscv

jim-wilson commented 3 years ago

On Wed, Mar 31, 2021 at 11:31 AM AnushaGoel14 @.***> wrote:

../spike_main/disasm.cc: In constructor ‘disassembler_t::disassembler_t(int)’: ../spike_main/disasm.cc:295:38: error: ‘match_mod’ was not declared in this scope add_insn(new disasm_insnt(name, match##code, mask_##code | (extra), VA_ARGS));

The match_* symbols are defined where encoding.h is included, which suggests that there is something wrong with your encoding.h changes. Try adding --save-temps to the gcc command used to build the disasm.cc file, and then look at the disasm.ii file to check that you got the expected result. You can use the command used to compile disasm.cc in the build.log file.

Jim

chenc6 commented 3 years ago

Hi @subhajit26 Did you see the Fatal error: duplicate add (maybe other instructions) when you put your new instructions in the subset?

subhajit26 commented 3 years ago

Hi @chenc6, Luckily, I did not face any such fatal error while implementing custom instructions. Regards, Subhajit.

On Thu, May 13, 2021 at 2:37 AM chenc6 @.***> wrote:

Hi @subhajit26 https://github.com/subhajit26 Did you see the Fatal error: duplicate add (maybe other instructions) when you put your new instructions in the subset?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/riscv/riscv-gnu-toolchain/issues/837#issuecomment-840195113, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQUE5YBRTU6O3M4Y376WHDDTNMNN7ANCNFSM4XAMIYJQ .

iams07 commented 3 years ago

Hi @subhajit26 Did you see the Fatal error: duplicate add (maybe other instructions) when you put your new instructions in the subset?

I believe that error is because your opcode is matching with the opcode of the add instruction, so there are basically two instructions which have the same opcode now. Just change the funct7 opcode and you should be fine

leduchuybk commented 3 years ago

Hi @subhajit26, I believe that you solved this problem. Can you share steps to add a new instruction to compiler?

TommyMurphyTM1234 commented 2 years ago

This is a list of all issues logged that I can find that relate to adding new instructions to the toolchain:

At this stage in the evolution of RISC-V,the majority of instruction addition will be done under the auspices of the relevant groups working on standardising, prototyping, and implementing/enhancing RISC-V extensions and their corresponding instructions. As such, there should be less of a demand for others to do so.

However, where somebody still wants to implement a non stadard custom extension/instruction here are some useful resources: