ucb-bar / berkeley-hardfloat

Other
287 stars 87 forks source link

Update resizeRawFloat.scala #80

Closed FJShen closed 11 months ago

FJShen commented 11 months ago

This code would actually have caused a "high index XX is out of range" error in Chisel 5

FJShen commented 11 months ago

I've honestly not tested this patch, but based on logic reasoning I thought the change would do what the original code was "intended for". After inspecting the source code for rawFloatFromFN and rawFloatFromRecFN, and also the case in this source file where the output significand is wider than the input significand, it is my impression that the proposed change is in line with existing code. Of course I may have overlooked corner cases, but given that the "rawFloat" isn't actually documented with a specification, I can't really tell. I'm just trying to make an ad-hoc fix for this code to work in my own use case (otherwise indexing into a bit higher than the upper bound of a field just doesn't make sense), but I'm open to informative suggestions on how to make this better.

FJShen commented 11 months ago

By the way I have searched in the hardfloat folder, and it looks like resizeRawFloat is not used anywhere. Is this method used in any external repos or projects?

jerryz123 commented 11 months ago

This patch matches the original intention of this code, which appears to implement significand rounding with round-towards-\infty. The upper bits of the significand are extracted from the source significand, and the lower bit is or'd from the remaining bits of the source significand. I'm not sure this is a Chisel5-specific error though...

Doing some archaeology, it seems this code predates support for proper rounding modes in this library, and serves no purpose at this point, as you've noticed.

FJShen commented 11 months ago

The reason why I'm trying to use this resizeRawFloat method, is because I hope to use RawFloat as the intermediate format of FP numbers in my design.

I'm building a fixed-function multi-stage pipeline (similar to a pipelined FFT module in architecture). Currently I'm registering numbers in RecFN format between all stages, and the RTL synthesis shows very high power usage in "logic " cells (contrary to "sequential"). I'm speculating if that's because all FP numbers need be converted to/from RawFloat format before/after each arithmetic unit, which needlessly bloats the logic footprint.

Right now I'm trying to store RawFloat in the intermediate stages, so resizeRawFloat is a good tool for me to control the bit width.

jerryz123 commented 11 months ago

. I'm speculating if that's because all FP numbers need be converted to/from RawFloat format before/after each arithmetic unit, which needlessly bloats the logic footprint.

This would be surprising to me, the conversion should only be a handful of gates, compared to any arithmetic logic.

object rawFloatFromRecFN
{
    def apply(expWidth: Int, sigWidth: Int, in: Bits): RawFloat =
    {
        val exp = in(expWidth + sigWidth - 1, sigWidth - 1)
        val isZero    = exp(expWidth, expWidth - 2) === 0.U
        val isSpecial = exp(expWidth, expWidth - 1) === 3.U

        val out = Wire(new RawFloat(expWidth, sigWidth))
        out.isNaN  := isSpecial &&   exp(expWidth - 2)
        out.isInf  := isSpecial && ! exp(expWidth - 2)
        out.isZero := isZero
        out.sign   := in(expWidth + sigWidth)
        out.sExp   := exp.zext
        out.sig    := 0.U(1.W) ## ! isZero ## in(sigWidth - 2, 0)
        out
    }
}

Perhaps you want to store FP numbers as RecFN, not RawFloat? And avoid the conversions between FN and RecFN? This is what some existing pipelines do (notably, the FP pipelines in Rocket and BOOM).

In any case, we can merge this, since this appears to fix an actual bug.

FJShen commented 11 months ago

I am storing FP numbers as RecFN. While the conversion from RecFN to RawFloat is trivial, the logic for the reverse conversion is quite complicated.

For example, all of AddRecFN, MulRecFN and MulAddRecFN instantiates the RoundRawFNToRecFN module at the end. Here's what the emitted SV code looks like for RoundRawFNToRecFN(8, 24, 0):

// Generated by CIRCT firtool-1.38.0
module RoundAnyRawFNToRecFN(    // <stdin>:3:10
  input         io_in_isNaN,    // external/hardfloat/src/main/scala/RoundAnyRawFNToRecFN.scala:57:16
                io_in_isInf,    // external/hardfloat/src/main/scala/RoundAnyRawFNToRecFN.scala:57:16
                io_in_isZero,   // external/hardfloat/src/main/scala/RoundAnyRawFNToRecFN.scala:57:16
                io_in_sign, // external/hardfloat/src/main/scala/RoundAnyRawFNToRecFN.scala:57:16
  input  [9:0]  io_in_sExp, // external/hardfloat/src/main/scala/RoundAnyRawFNToRecFN.scala:57:16
  input  [26:0] io_in_sig,  // external/hardfloat/src/main/scala/RoundAnyRawFNToRecFN.scala:57:16
  output [32:0] io_out  // external/hardfloat/src/main/scala/RoundAnyRawFNToRecFN.scala:57:16
);

  wire [8:0]  _roundMask_T_1 = ~(io_in_sExp[8:0]);  // external/hardfloat/src/main/scala/RoundAnyRawFNToRecFN.scala:155:37, external/hardfloat/src/main/scala/primitives.scala:52:21
  wire [64:0] _GEN = {59'h0, _roundMask_T_1[5:0]};  // external/hardfloat/src/main/scala/primitives.scala:52:21, :59:26, :76:56
  wire [64:0] roundMask_shift = $signed(65'sh10000000000000000 >>> _GEN);   // external/hardfloat/src/main/scala/primitives.scala:76:56
  wire [64:0] roundMask_shift_1 = $signed(65'sh10000000000000000 >>> _GEN); // external/hardfloat/src/main/scala/primitives.scala:76:56
  wire [24:0] _roundMask_T_73 =
    _roundMask_T_1[8]
      ? (_roundMask_T_1[7]
           ? {~(_roundMask_T_1[6]
                  ? 22'h0
                  : ~{roundMask_shift[42],
                      roundMask_shift[43],
                      roundMask_shift[44],
                      roundMask_shift[45],
                      roundMask_shift[46],
                      {{roundMask_shift[47:46], roundMask_shift[49]} & 3'h5, 1'h0}
                        | {roundMask_shift[49:48], roundMask_shift[51:50]} & 4'h5,
                      roundMask_shift[51],
                      roundMask_shift[52],
                      roundMask_shift[53],
                      roundMask_shift[54],
                      roundMask_shift[55],
                      roundMask_shift[56],
                      roundMask_shift[57],
                      roundMask_shift[58],
                      roundMask_shift[59],
                      roundMask_shift[60],
                      roundMask_shift[61],
                      roundMask_shift[62],
                      roundMask_shift[63]}),
              3'h7}
           : {22'h0,
              _roundMask_T_1[6]
                ? {roundMask_shift_1[0], roundMask_shift_1[1], roundMask_shift_1[2]}
                : 3'h0})
      : 25'h0;  // external/hardfloat/src/main/scala/RoundAnyRawFNToRecFN.scala:57:16, external/hardfloat/src/main/scala/primitives.scala:52:21, :58:25, :59:26, :62:24, :67:24, :68:58, :73:{17,21,32}, :76:56, :77:20, :78:22
  wire        _GEN_0 = _roundMask_T_73[0] | io_in_sig[26];  // external/hardfloat/src/main/scala/RoundAnyRawFNToRecFN.scala:119:57, :158:23, external/hardfloat/src/main/scala/primitives.scala:62:24
  wire [25:0] _GEN_1 = {_roundMask_T_73[24:1], _GEN_0, 1'h1};   // external/hardfloat/src/main/scala/RoundAnyRawFNToRecFN.scala:158:{23,42}, :242:60, external/hardfloat/src/main/scala/primitives.scala:62:24
  wire [25:0] _GEN_2 =
    io_in_sig[26:1] & {1'h1, ~(_roundMask_T_73[24:1]), ~_GEN_0} & _GEN_1;   // external/hardfloat/src/main/scala/RoundAnyRawFNToRecFN.scala:57:16, :158:{23,42}, :161:53, :162:28, :163:40, :242:60, external/hardfloat/src/main/scala/primitives.scala:62:24
  wire [25:0] roundedSig =
    (|_GEN_2)
      ? {1'h0, io_in_sig[26:2] | {_roundMask_T_73[24:1], _GEN_0}} + 26'h1
        & ~((|_GEN_2) & (io_in_sig[25:0] & _GEN_1) == 26'h0
              ? {_roundMask_T_73[24:1], _GEN_0, 1'h1}
              : 26'h0)
      : {1'h0, io_in_sig[26:2] & {~(_roundMask_T_73[24:1]), ~_GEN_0}};  // external/hardfloat/src/main/scala/RoundAnyRawFNToRecFN.scala:57:16, :158:{23,42}, :162:28, :163:{40,56}, :164:{42,62}, :172:16, :173:{32,49,57}, :174:{21,25,64}, :176:35, :179:{30,47}, :180:24, :242:60, external/hardfloat/src/main/scala/primitives.scala:62:24
  wire [10:0] sRoundedExp = {io_in_sExp[9], io_in_sExp} + {9'h0, roundedSig[25:24]};    // external/hardfloat/src/main/scala/RoundAnyRawFNToRecFN.scala:172:16, :184:{40,54}, :272:16
  wire        common_totalUnderflow = $signed(sRoundedExp) < 11'sh6B;   // external/hardfloat/src/main/scala/RoundAnyRawFNToRecFN.scala:184:40, :199:31
  wire        notNaN_isInfOut =
    io_in_isInf | ~io_in_isNaN & ~io_in_isInf & ~io_in_isZero
    & $signed(sRoundedExp[10:7]) > 4'sh2;   // external/hardfloat/src/main/scala/RoundAnyRawFNToRecFN.scala:184:40, :195:{30,50}, :236:{22,36,64}, :237:32, :247:32
  assign io_out =
    {~io_in_isNaN & io_in_sign,
     sRoundedExp[8:0] & ~(io_in_isZero | common_totalUnderflow ? 9'h1C0 : 9'h0)
       & {2'h3, ~notNaN_isInfOut, 6'h3F} | (notNaN_isInfOut ? 9'h180 : 9'h0)
       | (io_in_isNaN ? 9'h1C0 : 9'h0),
     io_in_isNaN | io_in_isZero | common_totalUnderflow
       ? {io_in_isNaN, 22'h0}
       : io_in_sig[26] ? roundedSig[23:1] : roundedSig[22:0]};  // <stdin>:3:10, external/hardfloat/src/main/scala/RoundAnyRawFNToRecFN.scala:119:57, :158:42, :172:16, :184:40, :186:37, :188:16, :189:27, :190:27, :199:31, :247:32, :249:22, :252:{14,18,32}, :263:17, :264:{14,18}, :272:16, :276:{16,73}, :277:16, :279:{12,38}, :280:16, :285:33, external/hardfloat/src/main/scala/primitives.scala:73:21
endmodule

module RoundRawFNToRecFN(   // <stdin>:251:10
  input         io_in_isNaN,    // external/hardfloat/src/main/scala/RoundAnyRawFNToRecFN.scala:297:16
                io_in_isInf,    // external/hardfloat/src/main/scala/RoundAnyRawFNToRecFN.scala:297:16
                io_in_isZero,   // external/hardfloat/src/main/scala/RoundAnyRawFNToRecFN.scala:297:16
                io_in_sign, // external/hardfloat/src/main/scala/RoundAnyRawFNToRecFN.scala:297:16
  input  [9:0]  io_in_sExp, // external/hardfloat/src/main/scala/RoundAnyRawFNToRecFN.scala:297:16
  input  [26:0] io_in_sig,  // external/hardfloat/src/main/scala/RoundAnyRawFNToRecFN.scala:297:16
  output [32:0] io_out  // external/hardfloat/src/main/scala/RoundAnyRawFNToRecFN.scala:297:16
);

  RoundAnyRawFNToRecFN roundAnyRawFNToRecFN (   // external/hardfloat/src/main/scala/RoundAnyRawFNToRecFN.scala:308:15
    .io_in_isNaN  (io_in_isNaN),
    .io_in_isInf  (io_in_isInf),
    .io_in_isZero (io_in_isZero),
    .io_in_sign   (io_in_sign),
    .io_in_sExp   (io_in_sExp),
    .io_in_sig    (io_in_sig),
    .io_out       (io_out)
  );
endmodule
aswaterman commented 11 months ago

Yeah, that is because that is not just a conversion, it is a rounding. I believe if you were convert to a slightly wider RecFN format, IIRC +1 exponent bit and +1 significand bit, then the rounding hardware will disappear and you'll just be left with a small normalization circuit. But if you go through multiple stages of this, the slowly increasing precision could become costly, so it might be unattractive for other reasons. Storing the numbers as RawFloats has slightly higher storage cost but might still be more attractive in some circumstances.

aswaterman commented 11 months ago

BTW, the only context in which HardFloat has been verified is performing a single operation on a RawFloat, then converting back to RecFN. There might be circumstances where chaining together multiple RawFloat operations has problems--or then again, maybe not--so just make sure to test carefully.