Closed jstraus59 closed 9 months ago
@jstraus59 from my reading of the code the first step in the FPU is to determine the "class" of a given float to extract +/- inf, NAN etc. before passing them further into the FPU. During this intake process if an number with exp 0 is seen the mantissa just gets set to 0, resulting in any subnormal being converted to a 0. E.g a subnormal A + subnormal B gets converted to 0.0 + 0.0. As this a pre-processing step it happens disregarding instruction class. I'll do a bit more digging but thats the gist of it.
Currently the fflag bits are .. not functional in general which include subnormals. This is on my agenda to fix, so assume that fflags will be functional going forward.
@mikaelsky Terrific - thanks for the info. I will assume that inputs to the Conversion and Computation instructions have their subnormal inputs flushed to 0.
I still need a couple details clarified, please (sorry but I have never done RTL design - just modeling of processor architectures - so it would be difficult for me to figure this out directly from the RTL myself):
1) I am wondering still about the FCLASS instruction, which simply reports the type of floating-point value that is in a register. One of the things it detects is whether the value is a subnormal, so it would not really make sense for subnormal inputs to be flushed for the FCLASS instruction, but if FCLASS is implemented within the FPU it sounds like they would. Could you please confirm?
2) I am assuming that subnormal inputs and results are actually flushed to +/-0, not just +0. Could you please confirm that, also?
@jstraus59 good question. Currently deep into fixing an issue with misaligned branches being out of spec. From a quick scan FCLASS currently never reports a denorm/subnormal class: op_is_denorm_v := '0'; -- FIXME / TODO -- op_e_all_zero_v and (not op_m_all_zero_v); -- subnormal
Correct the on the input path Rs1 and Rs2 gets converted to the equivalent of a +/- 0.0. Basically of EXP == 0 set MANT = 0; op_data(0)(22 downto 00) <= (others => '0') when (rs1_i(30 downto 23) = "00000000") else rs1_i(22 downto 0); -- flush mantissa to zero if subnormal
For the output side a subnormal result gets converted to +/- 0.0 as the result sign gets preserved and the EXP and MANT both get set to 0. elsif (ctrl.flags(fp_exc_uf_c) = '1') or -- underflow (sreg.zero = '1') or (ctrl.class(fp_class_neg_denorm_c) = '1') or (ctrl.class(fp_class_pos_denorm_c) = '1') then -- denormalized (flush-to-zero) ctrl.res_exp <= fp_single_pos_zero_c(30 downto 23); -- keep original sign ctrl.res_man <= fp_single_pos_zero_c(22 downto 00);
Ok - that answers my questions about everything but the fflags behavior.
I will leave the issue open as a reminder to document the precise fflags behavior w.r.t. subnormals once that implementation is finished.
Thanks @mikaelsky!
Hey everyone! Sorry for the late reply... So thanks @mikaelsky for jumping in! :wink:
Right now all input operands are being flushed-to-zero if they are subnormal before entering the actual FPU core. Hence, @mikaelsky is right saying that this flush-to-zero is applied to all instruction classes.
However, I think you are right that this does not make much sense for the fclass
instruction. I'll fix that.
I just saw there is a bug in the wiring of the exception flags output. I'll also fix that and I will also add some test cases to the floating-point tests to cover some simple conditions that should raise FPU exceptions.
@stnolting working on a misalign branch issue. Basically the core is subtly out of spec. Thought I had an "easy" fix, but JALR is not behaving. Basically any misalignment is supposed to be captured by the offending branch and trapped without updating the link register nor fetching from the misaligned PC.
I also have a set of MTVAL spec updates from various traps to help with spec compliance. Unsure when it makes sense to drop them in, kinda waiting for the 1.9.1 release before trying to apply updates.
On top I've added an RVVI trace port to help validate against an ISA like Imperas. This is where we've found the above mentioned spec issues.
Oh, sorry for overlooking that earlier comment... 🙈
I think you are right. If the CPU encounters a misaligned instruction fetch the link register will still be written (for link instructions) and the pipeline's front end will still try to read instruction data from that unaligned address... I will have a closer look at this. Thanks for finding this bug!
I have opened #734 to fix this misalignment-issue.
@mikaelsky
On top I've added an RVVI trace port to help validate against an ISA like Imperas. This is where we've found the above mentioned spec issues.
I'm curious... Can you say something more about that verification setup? Please let us know if there are any more spec-compatibility issues 🙈😅
@stnolting we are "working over the core" so to speak using an system from Imperas, https://www.imperas.com/imperas-riscv-solutions.
Basically we are doing what I've always done when validating processor cores. We compare the internal states of the processor after every instruction versus a golden reference model from Imperas. This means when any instruction is "retired" we check:
We do this for every piece of code we run, like the entire riscv-arch test suite.
To do these compares we yank out a lot of information from the core internals and provide it to the compare system. This is what the RVVI trace port is for. In my local branch I have a parameter that allows us to enable/disable the functionality, as it costs some logic for signal delays n such.
I have a few other custom parameters we use like: regfile_reset_en to turn on reset for the regfile and boot_address_en to allow us to feed an external boot address to the core to support stuff like application restore functions.
@mikaelsky Thanks for the mention and the link to Imperas. Actually a more direct link to the ImperasDV processor verification tools is here: https://www.imperas.com/imperasdv.
@mikaelsky
How do you "export" all this data from the core? Do you use something like the RISC-V Formal Interface (RVFI)? Or did you actually implement a full-scale trace port? I know there is a RISC-V proposal/extension for that, but I am not sure in what state this is...
regfile_reset_en
We have something similar now: https://github.com/stnolting/neorv32/pull/736 :wink:
and boot_address_en to allow us to feed an external boot address to the core to support stuff like application restore functions.
We are discussing something like this, too (-> https://github.com/stnolting/neorv32/issues/720).
We implemented the RVFI trace port. I did explore a full blown trace port but ended up punting on that for this particular iteration. When my github is up to tip of trunk I can add it in as a separate pull/push request. I do have a generic to disable the whole thing as it adds logic to retime the signals.
I noticed #736. Appreciated :)
For the boot address I have a generic/parameter to enable/disable the feature. Default boot is still 0xFFFF_0000. There are some additional feature changes here and there to enhance some features like support clock gating for sleep mode.
I also have a few more cases for mtval in the trap controller to be more aligned to the spec. Granted the spec is "annoying" in that it says "if mtval is not 0 then it has by XYZ".
-- NORMAL trap entry: write mcause, mepc and mtval - no update when in debug-mode! --
-- --------------------------------------------------------------------
if (CPU_EXTENSION_RISCV_Sdext = false) or ((trap_ctrl.cause(5) = '0') and (debug_ctrl.running = '0')) then
-- trap cause --
csr.mcause <= trap_ctrl.cause(trap_ctrl.cause'left) & trap_ctrl.cause(4 downto 0); -- type & identifier
-- trap PC --
csr.mepc <= trap_ctrl.epc(XLEN-1 downto 1) & '0';
-- trap value --
case trap_ctrl.cause is
when trap_brk_c | trap_iaf_c =>
csr.mtval <= trap_ctrl.epc(XLEN-1 downto 1) & '0';
when trap_lma_c | trap_laf_c | trap_sma_c | trap_saf_c => -- misaligned load/store address or load/store access error
csr.mtval <= mar_i; -- faulting data access address
when trap_iil_c => -- illegal instruction
csr.mtval <= execute_engine.ir; -- faulting instruction word
when trap_ima_c => -- instruction misaligned
-- Set mtval to be equal to the misaligned address
csr.mtval <= execute_engine.pc_illegal_branch;
when others => -- everything else including all interrupts
csr.mtval <= (others => '0');
end case;
-- update privilege level and interrupt enable stack --
csr.privilege <= priv_mode_m_c; -- execute trap in machine mode
csr.mstatus_mie <= '0'; -- disable interrupts
csr.mstatus_mpie <= csr.mstatus_mie; -- backup previous mie state
csr.mstatus_mpp <= csr.privilege; -- backup previous privilege mode
end if;
There is also a discussion to be had around mstatus_mpp reset value. Right now its reset to 0 which is USER mode independent of whether USER mode is enabled or not. In principle it should reset to priv_mode_m_c as we come out of reset in machine mode.
For now I have this little add-on, but would probable just change it to default to '1' or machine mode out of reset.
if (CPU_EXTENSION_RISCV_U = false) then
csr.mstatus_mpp <= '1';
else
csr.mstatus_mpp <= '0';
end if;
We implemented the RVFI trace port. I did explore a full blown trace port but ended up punting on that for this particular iteration. When my github is up to tip of trunk I can add it in as a separate pull/push request. I do have a generic to disable the whole thing as it adds logic to retime the signals.
Sounds good! I'm curious what that trace port will look like.
I also have a few more cases for mtval in the trap controller to be more aligned to the spec. Granted the spec is "annoying" in that it says "if mtval is not 0 then it has by XYZ".
Yeah... the RISC-V priv. spec. is quite annoying there... 😅 I tried to support just a minimum set of features to keep the hardware requirements low (while still passing the RISC-V architecture tests).
There is also a discussion to be had around mstatus_mpp reset value. Right now its reset to 0 which is USER mode independent of whether USER mode is enabled or not. In principle it should reset to priv_mode_m_c as we come out of reset in machine mode.
Good point! I will fix that.
edit -> https://github.com/stnolting/neorv32/pull/745 :wink:
Reviewing the comments about the original subject (subnormal behavior), there are still a couple issues that I don't think have been explicitly answered:
fadd.s fsub.s fmul.s fmin.s fmax.s feq.s flt.s fle.s fcvt.w.s fcvt.wu.s fcvt.s.w fcvt.s.wu
And these, that simply manipulate bits in a floating point value, do not:
fsgnj.s fsgnjn.s fsgnjx.s fclass.s
Please advise if this is accurate.op_is_denorm_v := '0'; -- FIXME / TODO -- op_e_all_zero_v and (not op_m_all_zero_v); -- subnormal
Can I assume that is going to be fixed?Thanks
The questions seems to have been answered and the model updated to mach the core behavior. I will close this issue as its been resolved on our end. Thank you.
@mikaelsky thank you very much for all your work on this! :)
The NEORV32 documentation Zfinx extension documentation states:
Subnormal numbers (exponent = 0) are flushed to zero setting them to +/- 0 before entering the FPU’s processing core. If a computational instruction (like fmul.s) generates a subnormal result, the result is also flushed to zero during normalization.
Would it be possible to be add more detail on subnormal flushing, specifically:
Thanks!
Jim Straus Imperas