Help needed: Integration of cvFPU with cv32e40p and adding a custom instruction

diggi0330 commented 11 months ago

Hi all (@zarubaf @Razer6 @wallento @fabianschuiki @gmarkall),

I am new to RISCV and am stuck with the following problems while running cv32e40p core:

cvFPU integration with cv32e40p:

(Implemented: ) I verified the functionality of FP operations using the base core by setting the FPU = 1 and ZFINX = 1. For this case, I set the -march = rv32imc_zicsr and -mabi = ilp32. Using any other values for march and mabi results in a BSP error (as expected). Also, note that this version of the code runs with verilator and riscv-gnu toolchain. Since the base core does not have default support of FPU new, the latency for each operation is huge.

a. Hence, I need to use the FPU (cvfpu) with cv32e40p and verilator support to have a faster implementation of FP operations. Could you please guide me through the steps on how to integrate it?

I figured out that the riscv-gnu toolchain has to be re-compiled with -march = rv32imcf/ rv32gc and -mabi = ilp32f. (Please correct me if I'm wrong about this.)

Note: I do not have vsim support on my pc.

Adding a custom instruction:

I need custom instruction support for my proposed design in R4 format for a module: eg. rd = appx(r1, r2, r3). (Correct me if my understanding is incorrect) Note: All the registers required by my instruction are 8-bit operands and the instruction performs bit-level operations.

a. Could you please guide me through the available encoding space for such custom extensions? I found out through earlier issues that all the custom instructions (custom_0, custom_1, custom_2, custom_3) are already in use (as written in cv32e40p_pkg.sv line 62).

b. Alternatively (for my issue 2), can you please suggest an extension in R format for a module: eg. rd = mod(r1, r2) where all the registers are 32-bit?

c. On inspecting the cv32e40p_pkg.sv, I found that the encoding field: instr_rdata_o[6:0] = 0xe; instr_rdata_o[14:12] = 0x0; instr_rdata_o[31:25] = 0x0 is free. Can I this as my instruction encoding? Also, I know that I need to make changes to cv3240p_decoder.sv and cv3240p_alu.sv. Can you please let me know if more files need to be changed for this?

d. Or should I just create a new pipeline stage to avoid any conflict with the existing structure as we do for FPU? Any suggestions on these implementations are welcome.

Thanks, Digvijay

gmarkall commented 11 months ago

@diggi0330 Why did you tag me? (I'm wondering if it's because I wrote a guide to adding custom instructions to RI5CY years ago - if so, I haven't kept that up to date with the OpenHWGroup cores like cv32e40p etc., so I'm concerned you might be following outdated info if so)

MikeOpenHWGroup commented 11 months ago

Hi @diggi0330, thanks for your interest in OpenHW! Note that none of the individuals you "at mentioned" are currently involved in the CV32E40P. The current version of this core has explicit support for the floating point unit which you can control via top-level parameters at compile/synthesis time. Check out the Core Integration chapter of the user Manual for details about how that is done.

We no longer explicitly support Verilator with any of our scriptware. Having said that, there should be no reason why the CV32E40P cannot be compiled and simulated with an up to date version of Verilator. Have a look at the core testbench in CORE-V-VERIF for a good starting point. Note that we do not maintain that testbench, so it will be rather out of date. A pull-request to update it would be welcome!

If you are interested in adding custom instructions to a RISC-V core, have a look at the CV32E40X.

diggi0330 commented 11 months ago

Hi @MikeOpenHWGroup and others, sorry for bugging you earlier and tagging you unknowingly. Also, thank you for your prompt reply to my questions.

A quick update: I was able to extend the GNU assembler to support both R4-type and R-type custom instruction extensions. I used a reserved OPCODE to get the required functionality.

The current version of this core has explicit support for the floating point unit which you can control via top-level parameters at compile/synthesis time. Check out the Core Integration chapter of the user Manual for details about how that is done.

According to my understanding of the code, this is the case where we use APU with FPU = 1 and ZFINX = 1. This performs the required fp-operations however, the latency is huge since it does not use cvFPU. (Please correct me if I am wrong about this). I already verified this with the verilator and got the correct result.

However, I am still stuck on including FPU support with the cv32e40p core. Please guide me on how to compile the cv32e40pV2 with uvm support. I can't locate the cv32e40pV2 Github repository.

MikeOpenHWGroup commented 11 months ago

Hi @diggi0330 there is only one CV32E40P repo. You determine which version you compile/synthesize using the top-level instanitation parameters. I strongly recommend reading the User Manual (not the code!) to learn how this is done. Pay particular attention to the Core Integration and Core Versions chapters.

diggi0330 commented 11 months ago

Thanks for the help. I’ll try to go through the user manual.

openhwgroup / cv32e40p

Help needed: Integration of cvFPU with cv32e40p and adding a custom instruction #924