Clarification on Zfinx behavior for subnormal operands and results needed

jstraus59 commented 1 year ago

The NEORV32 documentation Zfinx extension documentation states:

Subnormal numbers (exponent = 0) are flushed to zero setting them to +/- 0 before entering the FPU’s processing core. If a computational instruction (like fmul.s) generates a subnormal result, the result is also flushed to zero during normalization.

Would it be possible to be add more detail on subnormal flushing, specifically:

How are the fflag bits set when subnormal inputs are seen, or when subnormal results are flushed to zero. A precise specification for each instruction class (Computational, Conversion, Compare and Clasify) would be helpful.
For input flushing, it simply states flushed...before entering the FPU’s processing core, but exactly which instruction classes are implemented in the FPU core, and thus input flushing applies to them, is not clear. Obviously, this applies to the Computational instruction class, but an explicit specification of whether this also applies to the Conversion, Compare and Clasify instruction classes would be helpful.

Thanks!

Jim Straus Imperas

mikaelsky commented 1 year ago

@jstraus59 from my reading of the code the first step in the FPU is to determine the "class" of a given float to extract +/- inf, NAN etc. before passing them further into the FPU. During this intake process if an number with exp 0 is seen the mantissa just gets set to 0, resulting in any subnormal being converted to a 0. E.g a subnormal A + subnormal B gets converted to 0.0 + 0.0. As this a pre-processing step it happens disregarding instruction class. I'll do a bit more digging but thats the gist of it.

Currently the fflag bits are .. not functional in general which include subnormals. This is on my agenda to fix, so assume that fflags will be functional going forward.

jstraus59 commented 1 year ago

@mikaelsky Terrific - thanks for the info. I will assume that inputs to the Conversion and Computation instructions have their subnormal inputs flushed to 0.

I still need a couple details clarified, please (sorry but I have never done RTL design - just modeling of processor architectures - so it would be difficult for me to figure this out directly from the RTL myself):

1) I am wondering still about the FCLASS instruction, which simply reports the type of floating-point value that is in a register. One of the things it detects is whether the value is a subnormal, so it would not really make sense for subnormal inputs to be flushed for the FCLASS instruction, but if FCLASS is implemented within the FPU it sounds like they would. Could you please confirm?

2) I am assuming that subnormal inputs and results are actually flushed to +/-0, not just +0. Could you please confirm that, also?

mikaelsky commented 1 year ago

@jstraus59 good question. Currently deep into fixing an issue with misaligned branches being out of spec. From a quick scan FCLASS currently never reports a denorm/subnormal class: op_is_denorm_v := '0'; -- FIXME / TODO -- op_e_all_zero_v and (not op_m_all_zero_v); -- subnormal

Correct the on the input path Rs1 and Rs2 gets converted to the equivalent of a +/- 0.0. Basically of EXP == 0 set MANT = 0; op_data(0)(22 downto 00) <= (others => '0') when (rs1_i(30 downto 23) = "00000000") else rs1_i(22 downto 0); -- flush mantissa to zero if subnormal

For the output side a subnormal result gets converted to +/- 0.0 as the result sign gets preserved and the EXP and MANT both get set to 0. elsif (ctrl.flags(fp_exc_uf_c) = '1') or -- underflow (sreg.zero = '1') or (ctrl.class(fp_class_neg_denorm_c) = '1') or (ctrl.class(fp_class_pos_denorm_c) = '1') then -- denormalized (flush-to-zero) ctrl.res_exp <= fp_single_pos_zero_c(30 downto 23); -- keep original sign ctrl.res_man <= fp_single_pos_zero_c(22 downto 00);

jstraus59 commented 1 year ago

Ok - that answers my questions about everything but the fflags behavior.

I will leave the issue open as a reminder to document the precise fflags behavior w.r.t. subnormals once that implementation is finished.

Thanks @mikaelsky!

stnolting commented 1 year ago

Hey everyone! Sorry for the late reply... So thanks @mikaelsky for jumping in! :wink:

Right now all input operands are being flushed-to-zero if they are subnormal before entering the actual FPU core. Hence, @mikaelsky is right saying that this flush-to-zero is applied to all instruction classes.

However, I think you are right that this does not make much sense for the fclass instruction. I'll fix that.

I just saw there is a bug in the wiring of the exception flags output. I'll also fix that and I will also add some test cases to the floating-point tests to cover some simple conditions that should raise FPU exceptions.

mikaelsky commented 1 year ago

@stnolting working on a misalign branch issue. Basically the core is subtly out of spec. Thought I had an "easy" fix, but JALR is not behaving. Basically any misalignment is supposed to be captured by the offending branch and trapped without updating the link register nor fetching from the misaligned PC.

I also have a set of MTVAL spec updates from various traps to help with spec compliance. Unsure when it makes sense to drop them in, kinda waiting for the 1.9.1 release before trying to apply updates.

On top I've added an RVVI trace port to help validate against an ISA like Imperas. This is where we've found the above mentioned spec issues.

stnolting commented 1 year ago

Oh, sorry for overlooking that earlier comment... 🙈

I think you are right. If the CPU encounters a misaligned instruction fetch the link register will still be written (for link instructions) and the pipeline's front end will still try to read instruction data from that unaligned address... I will have a closer look at this. Thanks for finding this bug!

stnolting commented 1 year ago

I have opened #734 to fix this misalignment-issue.

@mikaelsky

On top I've added an RVVI trace port to help validate against an ISA like Imperas. This is where we've found the above mentioned spec issues.

I'm curious... Can you say something more about that verification setup? Please let us know if there are any more spec-compatibility issues 🙈😅

mikaelsky commented 1 year ago

@stnolting we are "working over the core" so to speak using an system from Imperas, https://www.imperas.com/imperas-riscv-solutions.

Basically we are doing what I've always done when validating processor cores. We compare the internal states of the processor after every instruction versus a golden reference model from Imperas. This means when any instruction is "retired" we check:

Are the correct processor registers updated, and are we not missing anything.
Is the PC correct
Is the opcode correct
Is the next PC correct
Have any CSRs been updated and if so are they the right ones.
Have any memories been read/written from and is the address and data correct.

We do this for every piece of code we run, like the entire riscv-arch test suite.

To do these compares we yank out a lot of information from the core internals and provide it to the compare system. This is what the RVVI trace port is for. In my local branch I have a parameter that allows us to enable/disable the functionality, as it costs some logic for signal delays n such.

I have a few other custom parameters we use like: regfile_reset_en to turn on reset for the regfile and boot_address_en to allow us to feed an external boot address to the core to support stuff like application restore functions.

LarryImperas commented 1 year ago

@mikaelsky Thanks for the mention and the link to Imperas. Actually a more direct link to the ImperasDV processor verification tools is here: https://www.imperas.com/imperasdv.

stnolting commented 12 months ago

@mikaelsky

How do you "export" all this data from the core? Do you use something like the RISC-V Formal Interface (RVFI)? Or did you actually implement a full-scale trace port? I know there is a RISC-V proposal/extension for that, but I am not sure in what state this is...

regfile_reset_en

We have something similar now: https://github.com/stnolting/neorv32/pull/736 :wink:

and boot_address_en to allow us to feed an external boot address to the core to support stuff like application restore functions.

We are discussing something like this, too (-> https://github.com/stnolting/neorv32/issues/720).

mikaelsky commented 12 months ago

We implemented the RVFI trace port. I did explore a full blown trace port but ended up punting on that for this particular iteration. When my github is up to tip of trunk I can add it in as a separate pull/push request. I do have a generic to disable the whole thing as it adds logic to retime the signals.

I noticed #736. Appreciated :)

For the boot address I have a generic/parameter to enable/disable the feature. Default boot is still 0xFFFF_0000. There are some additional feature changes here and there to enhance some features like support clock gating for sleep mode.

I also have a few more cases for mtval in the trap controller to be more aligned to the spec. Granted the spec is "annoying" in that it says "if mtval is not 0 then it has by XYZ".

          -- NORMAL trap entry: write mcause, mepc and mtval - no update when in debug-mode! --
          -- --------------------------------------------------------------------
          if (CPU_EXTENSION_RISCV_Sdext = false) or ((trap_ctrl.cause(5) = '0') and (debug_ctrl.running = '0')) then
            -- trap cause --
            csr.mcause <= trap_ctrl.cause(trap_ctrl.cause'left) & trap_ctrl.cause(4 downto 0); -- type & identifier
            -- trap PC --
            csr.mepc <= trap_ctrl.epc(XLEN-1 downto 1) & '0';
            -- trap value --
            case trap_ctrl.cause is
              when trap_brk_c | trap_iaf_c =>
                csr.mtval <= trap_ctrl.epc(XLEN-1 downto 1) & '0';
              when trap_lma_c | trap_laf_c | trap_sma_c | trap_saf_c => -- misaligned load/store address or load/store access error
                csr.mtval <= mar_i; -- faulting data access address
              when trap_iil_c => -- illegal instruction
                csr.mtval <= execute_engine.ir; -- faulting instruction word
              when trap_ima_c => -- instruction misaligned
                -- Set mtval to be equal to the misaligned address
                csr.mtval <= execute_engine.pc_illegal_branch;
              when others => -- everything else including all interrupts
                csr.mtval <= (others => '0');
            end case;
            -- update privilege level and interrupt enable stack --
            csr.privilege    <= priv_mode_m_c; -- execute trap in machine mode
            csr.mstatus_mie  <= '0'; -- disable interrupts
            csr.mstatus_mpie <= csr.mstatus_mie; -- backup previous mie state
            csr.mstatus_mpp  <= csr.privilege; -- backup previous privilege mode
          end if;

There is also a discussion to be had around mstatus_mpp reset value. Right now its reset to 0 which is USER mode independent of whether USER mode is enabled or not. In principle it should reset to priv_mode_m_c as we come out of reset in machine mode.

For now I have this little add-on, but would probable just change it to default to '1' or machine mode out of reset.

      if (CPU_EXTENSION_RISCV_U = false) then
        csr.mstatus_mpp   <= '1';
      else
        csr.mstatus_mpp   <= '0';
      end if;

stnolting commented 12 months ago

We implemented the RVFI trace port. I did explore a full blown trace port but ended up punting on that for this particular iteration. When my github is up to tip of trunk I can add it in as a separate pull/push request. I do have a generic to disable the whole thing as it adds logic to retime the signals.

Sounds good! I'm curious what that trace port will look like.

I also have a few more cases for mtval in the trap controller to be more aligned to the spec. Granted the spec is "annoying" in that it says "if mtval is not 0 then it has by XYZ".

Yeah... the RISC-V priv. spec. is quite annoying there... 😅 I tried to support just a minimum set of features to keep the hardware requirements low (while still passing the RISC-V architecture tests).

There is also a discussion to be had around mstatus_mpp reset value. Right now its reset to 0 which is USER mode independent of whether USER mode is enabled or not. In principle it should reset to priv_mode_m_c as we come out of reset in machine mode.

Good point! I will fix that.

edit -> https://github.com/stnolting/neorv32/pull/745 :wink:

jstraus59 commented 10 months ago

Reviewing the comments about the original subject (subnormal behavior), there are still a couple issues that I don't think have been explicitly answered:

@mikaelsky wrote "Currently the fflag bits are .. not functional in general which include subnormals. This is on my agenda to fix, so assume that fflags will be functional going forward." Still need to know the precise behavior of fflags w.r.t. denormal flushing. I don't think this is defined as part of any spec (AIUI the IEEE spec does not address subnormal flushing at all.)
A precise definition of which instructions "enter the actual FPU core" is needed. Currently I am assuming the following instructions are executed in the FPU core, and thus perform denormal flushing of inputs and results: fadd.s fsub.s fmul.s fmin.s fmax.s feq.s flt.s fle.s fcvt.w.s fcvt.wu.s fcvt.s.w fcvt.s.wu And these, that simply manipulate bits in a floating point value, do not: fsgnj.s fsgnjn.s fsgnjx.s fclass.s Please advise if this is accurate.
A comment indicated that the fclass instruction has a FIXME note for the denormal flag: op_is_denorm_v := '0'; -- FIXME / TODO -- op_e_all_zero_v and (not op_m_all_zero_v); -- subnormal Can I assume that is going to be fixed?

Thanks

mikaelsky commented 10 months ago

The fflag bits are currently not correctly implemented and need debugging. This is an ongoing effort. Assume the fflags are correct and functional, as that is the intended state.
Not sure why its important whether fsgnj.s, fsgnjn.s, fsgnjx.s and fclass.s enter the FPU core or not? Practically for the neorv32 implementation ALL Zfinx instructions pass into the FPU module. This allows the RTL to be more easily removed when Zfinx isn't used. This is an important implementation feature for the neorv32 core. For the modelling effort it can assumed that these instructions work as defined in the Zfinx extension. If it easier for the modelling effort to "assume they do not enter the FPU" assume away. As long as they will flag an illegal instruction when the Zfinx extension is disabled.
This will be fixed yes.

mikaelsky commented 9 months ago

The questions seems to have been answered and the model updated to mach the core behavior. I will close this issue as its been resolved on our end. Thank you.

stnolting commented 9 months ago

@mikaelsky thank you very much for all your work on this! :)

stnolting / neorv32

Clarification on Zfinx behavior for subnormal operands and results needed #728