FMV.X.D and FSD on single-precision values

aswaterman commented 7 years ago

I believe the following statements in the spec are necessary but insufficient:

If a floating-point register holds a single-precision value, it is guaranteed that a FSD of that register will place a value into memory that when reloaded with a FLD will recreate the original single-precision value in a register. The data format that is stored in memory is undefined beyond having this property.

If the last value written to the source floating-point register was a single-precision floating-point value, then the value returned by FMV.X.D is undefined beyond having the property that moving the value back to a floating-point register will recreate the original single-precision value.

FSD and FMV.X.D should be defined to create the same implementation-defined values as each other, and FLD and FMV.D.X should restore them equivalently. In particular, FSD followed by LD and FMV.D.X should properly recreate the single-precision value, as should FMV.X.D followed by SD and FLD.

sorear commented 7 years ago

Also relevant to Q.

DSHorner commented 7 years ago

I was investigating this in the berkely-hardfloat implementation, as I was wanting to understand if FMV.X.D followed by SD is equivalent to FLD. If they are not equivalent it would provide a means of determining that an embedded single float exists.

However, I am concerned about over use of the embedded single/double in double/quad when such representations are not guaranteed to interoperate across implementations. Applications that rely on the dual representation will not be portable. And, although, RISC-V is intended to support "implementations, including heterogeneous multiprocessors", transferring a "user image" to a core that uses different encoding for this dual representation appears to be impossible with the current guarantees. I was considering if extending the FCLASS.D with bit 10 and 11 to indicate single and double values would help ameliorate this problem.

After I understood what the berkely-hardfloat did, I was planning on asking the question on the ISA-DEV group.

aswaterman commented 7 years ago

Standard software should never rely on the encoding of a float32 embedded in a float64.

sorear commented 7 years ago

Alice ran a RISC-V program which uses float variables. It crashed, and she emailed me the core dump (ok, that's a little weird, but not that weird). Core dumps contain a copy of the ptrace_regs struct; since the kernel doesn't know how a register is being used at a given time, it stores all F-registers in double precision. I loaded the core dump into gdb on a different computer with a different CPU vendor. My copy of gdb was able to interpret the saved register values and print the correct values of float local variables on the call stack, which seems to have implied that the recoding algorithms were at some point documented and standardized, and gdb was somehow able to determine which one was in use.

aswaterman commented 7 years ago

Bob argued the gdb implementation relied upon non-standard behavior with respect to Sec. 8.2 of the RV user ISA spec, but nevertheless acknowledged that it was useful.

asb commented 7 years ago

CC @mwachs5 as I think this is relevant to the debug spec as well. It is useful if external debug is able to correctly interpret the value in a register when dumping the register file. Implementation-defined encodings are obviously a barrier to that, although the config-string could be used to indicate the implementation-chosen encoding.

DSHorner commented 7 years ago

the FCLASS.D and FCLASS.Q flags for F and D (and perhaps Q) values is valuable in debugging - as otherwise there is no direct way to discover the results in the float registers.

DSHorner commented 7 years ago

Sorry - added this before my page refreshed -you were already going this way on this thread.

kasanovic commented 7 years ago

I'm not sure the current HardFloat FPU design can actually tell from the bit pattern what's stored there, and might need additional logic to remember the precision in each FP register. I'm also not sure that FCLASS by itself really solves the problem as the bit vector might not be present in a register when you need to interpret it, so unless you require that software always dump this information with any opaque FP bit vector in memory it won't help. I think it would be better to define special higher-precision NaN encodings to represent the lower-precision value stored in the higher-precision register, but this would still need the extra state in reg file and more datapath logic, and could result in non-standard NaN handling (though maybe not if we're clever).

[ Saving below as thought exercise on FCLASS, but I don't think this is solution: If we consider extending FCLASS (which I'm not sure is enough), there are few paths to doing this, including some that are a bit more awkward but that retain backwards compatibility. Option a) add four bits to FCLASS result indicating H/F/D/Q floating-point value in register (vector extension adds half-precision floating-point scalar operands). This would strictly break backwards compatibility, but not really affect most code that used FCLASS, and no commercial hard FPUs are out yet, and there have been other minor changes around NaNs since spec was frozen by Berkeley. Option b) same as option a) except implementations can choose not to implement new feature by not setting any precision bit - debugger etc can determine if information is available by seeing if any bit is set. This is obviously less preferable for new code in future, but is backwards compatible. Option c) Only provide bits for H/D/Q, and if they're clear, then precision is F. This would be backwards compatible with existing implementations that had single-precision FPUs. ]

asb commented 7 years ago

We should also be aware that some language runtime implementations (LuaJIT, JSCore, SpiderMonkey) like to use NaN encodings to store pointers in a double ("nan-boxing"). There is a pretty huge encoding space here, but perhaps one of the specifications (a "platform" spec if not the isa spec) can indicate NaN encodings normally available for application use.

DSHorner commented 7 years ago

Are there more ways to implement this guarantee than 1) float32 (possibly internal) bit pattern embedded in a float64 NaN, and 2) float64 representation of the float32 value.

@abs - for heterogeneous interop the permutations, especially for #1, are potentially so massive as not to be reasonably encoded in a config-string. (The internal float32 representation could be up to 50ish bits and the signaling NaN values could be an arbitrary function of that value and even float hardware designation - an opaque token )

It is more feasible to use the source implementation to do the float32 extraction (at time of dump or task migration) which appears to require explicit notification of freg's current value type.

kasanovic commented 7 years ago

Any hardware implementation must already have a way of saving the IEEE representation of a 32-bit float stored in a 64/65-bit internal FPU register to implement FSW and FMV, so requiring that they stuff that into part of the encoding of an opaque FP register is not a big stretch, so I say if we were to go down this path, then that's the only sensible external encoding to use (plus some special NaN envelope). However, I believe the big pain point is the need to have extra internal state to remember the type of the value stored in each register.

asb commented 7 years ago

@dshorner: Fully agreed, to be reasonably encoded in the config string you'd need to support the specification of a limited set of possible encodings. It might be enough to have the standard define a "recommended" encoding and have the config string indicate if this isn't used (i.e. you're on your own). I agree it's more general to rely on fclass, your task migration is example is another good motivation for coming up with a solution here.

asb commented 7 years ago

My understanding is that as it stands, there is no way to fully support migration of FP registers (e.g. by context switchign a task to another core) on a heterogeneous cluster where it is unknown whether the FPU implementation is identical.

With @DSHorner's FCLASS suggestion, the context switch code would:

Save: For each FPR, execute fclass and check the 'F' bit. If it is set, then use fsw, otherwise use fsd. Write a bitmask to indicate the size of the fp value that is stored
Restore: For each fpr, use the bitmask to determine whether to fld or fsd for each saved value

What do other platforms do here? Assuming the encoding is known to be the same, is it always safe+IEEE compliant to fsd an FPR used to hold a single precision value, then later fld it back into the register file and treat it as a single-precision value?

sorear commented 7 years ago

@asb Consider what happens if procedure A is using fs0 for a float and calls procedure B which also uses fs0: procedure B spills fs0 to the stack using FSD. Which means that there are now recoded values on the stack. Which means that your "heterogenous cluster migrator" now needs, at the very least, a DWARF unwinder to find all of the spill slots that might need rewriting, and most likely application knowledge as well (especially if the application is something like riscv/riscv-go that does "interesting" things with stacks).

I think migration is only going to happen between chips with compatible float formats, but I'm not really bothered by this since you already need to keep AT_HWCAP stuff the same.

@DSHorner That might work for Berkeley's 65-bit registers, but it doesn't work for spill slots halfway up the stack, and it won't work on implementations with 64-bit registers that store IEEE values directly (I suspect this will be not uncommon).

asb commented 7 years ago

@sorear: Thanks, you're right that I was missing the bigger picture by just worrying about saving/restoring registers - fixing that is insufficient. Explicitly documenting that cores should use compatible float formats if they want to support process migration between them is one way to go - if this was done, the specification should suggest a preferred way so people without a reason to opt-out at least cluster around a common solution. There are still problems:

A requirement that cache line sizes or machine vector length is the same is easy for vendors to support as configuration options, but different encodings could mean cores from different vendors are fundamentally incompatible with each other unless they change their FPU implementation
Requiring compatible encodings within a single SoC is one thing, but it's going to make functionality like virtual machine migration very difficult or impossible. If I want to support migration between AMD+Intel or just different Intel CPUs, I can expose a minimal CPUID and have it just work. This isn't going to be possible if the two vendors use different float-within-double encodings.

DSHorner commented 7 years ago

Two additional considerations for coverage: With the introduction of float16 (half precision) will there be a requirement that FSW also support the dual type guarantee? Applications must ensure they write external values in the correct format of the float type. The hazard introduced by embedded lesser formats is potentially high for the realized optimization, especially when there is no mitigation possible by directly checking/using an explicit type value.

asb commented 7 years ago

As I now understand the problem, I think picking a standard encoding/serialisation is the sensible way forward. I've written up a summary document, and kicked of a discussion on isa-dev.

DSHorner commented 7 years ago

@asb Thank you so much for : I've written up a summary document, and kicked of a discussion on isa-dev.

aswaterman commented 7 years ago

This has been effectively resolved on the mailing list, but I will wait for @kasanovic's writeup to hit the repo before closing the issue.

kasanovic commented 7 years ago

NaN-boxing solution added to spec.

riscv / riscv-isa-manual

FMV.X.D and FSD on single-precision values #30