Closed aswaterman closed 7 years ago
Also relevant to Q.
I was investigating this in the berkely-hardfloat implementation, as I was wanting to understand if FMV.X.D followed by SD is equivalent to FLD. If they are not equivalent it would provide a means of determining that an embedded single float exists.
However, I am concerned about over use of the embedded single/double in double/quad when such representations are not guaranteed to interoperate across implementations. Applications that rely on the dual representation will not be portable. And, although, RISC-V is intended to support "implementations, including heterogeneous multiprocessors", transferring a "user image" to a core that uses different encoding for this dual representation appears to be impossible with the current guarantees. I was considering if extending the FCLASS.D with bit 10 and 11 to indicate single and double values would help ameliorate this problem.
After I understood what the berkely-hardfloat did, I was planning on asking the question on the ISA-DEV group.
Standard software should never rely on the encoding of a float32 embedded in a float64.
Alice ran a RISC-V program which uses float variables. It crashed, and she emailed me the core dump (ok, that's a little weird, but not that weird). Core dumps contain a copy of the ptrace_regs
struct; since the kernel doesn't know how a register is being used at a given time, it stores all F-registers in double precision. I loaded the core dump into gdb on a different computer with a different CPU vendor. My copy of gdb was able to interpret the saved register values and print the correct values of float local variables on the call stack, which seems to have implied that the recoding algorithms were at some point documented and standardized, and gdb was somehow able to determine which one was in use.
Bob argued the gdb implementation relied upon non-standard behavior with respect to Sec. 8.2 of the RV user ISA spec, but nevertheless acknowledged that it was useful.
CC @mwachs5 as I think this is relevant to the debug spec as well. It is useful if external debug is able to correctly interpret the value in a register when dumping the register file. Implementation-defined encodings are obviously a barrier to that, although the config-string could be used to indicate the implementation-chosen encoding.
the FCLASS.D and FCLASS.Q flags for F and D (and perhaps Q) values is valuable in debugging - as otherwise there is no direct way to discover the results in the float registers.
Sorry - added this before my page refreshed -you were already going this way on this thread.
I'm not sure the current HardFloat FPU design can actually tell from the bit pattern what's stored there, and might need additional logic to remember the precision in each FP register. I'm also not sure that FCLASS by itself really solves the problem as the bit vector might not be present in a register when you need to interpret it, so unless you require that software always dump this information with any opaque FP bit vector in memory it won't help. I think it would be better to define special higher-precision NaN encodings to represent the lower-precision value stored in the higher-precision register, but this would still need the extra state in reg file and more datapath logic, and could result in non-standard NaN handling (though maybe not if we're clever).
[ Saving below as thought exercise on FCLASS, but I don't think this is solution: If we consider extending FCLASS (which I'm not sure is enough), there are few paths to doing this, including some that are a bit more awkward but that retain backwards compatibility. Option a) add four bits to FCLASS result indicating H/F/D/Q floating-point value in register (vector extension adds half-precision floating-point scalar operands). This would strictly break backwards compatibility, but not really affect most code that used FCLASS, and no commercial hard FPUs are out yet, and there have been other minor changes around NaNs since spec was frozen by Berkeley. Option b) same as option a) except implementations can choose not to implement new feature by not setting any precision bit - debugger etc can determine if information is available by seeing if any bit is set. This is obviously less preferable for new code in future, but is backwards compatible. Option c) Only provide bits for H/D/Q, and if they're clear, then precision is F. This would be backwards compatible with existing implementations that had single-precision FPUs. ]
We should also be aware that some language runtime implementations (LuaJIT, JSCore, SpiderMonkey) like to use NaN encodings to store pointers in a double ("nan-boxing"). There is a pretty huge encoding space here, but perhaps one of the specifications (a "platform" spec if not the isa spec) can indicate NaN encodings normally available for application use.
Are there more ways to implement this guarantee than 1) float32 (possibly internal) bit pattern embedded in a float64 NaN, and 2) float64 representation of the float32 value.
@abs - for heterogeneous interop the permutations, especially for #1, are potentially so massive as not to be reasonably encoded in a config-string. (The internal float32 representation could be up to 50ish bits and the signaling NaN values could be an arbitrary function of that value and even float hardware designation - an opaque token )
It is more feasible to use the source implementation to do the float32 extraction (at time of dump or task migration) which appears to require explicit notification of freg's current value type.
Any hardware implementation must already have a way of saving the IEEE representation of a 32-bit float stored in a 64/65-bit internal FPU register to implement FSW and FMV, so requiring that they stuff that into part of the encoding of an opaque FP register is not a big stretch, so I say if we were to go down this path, then that's the only sensible external encoding to use (plus some special NaN envelope). However, I believe the big pain point is the need to have extra internal state to remember the type of the value stored in each register.
@dshorner: Fully agreed, to be reasonably encoded in the config string you'd need to support the specification of a limited set of possible encodings. It might be enough to have the standard define a "recommended" encoding and have the config string indicate if this isn't used (i.e. you're on your own). I agree it's more general to rely on fclass, your task migration is example is another good motivation for coming up with a solution here.
My understanding is that as it stands, there is no way to fully support migration of FP registers (e.g. by context switchign a task to another core) on a heterogeneous cluster where it is unknown whether the FPU implementation is identical.
With @DSHorner's FCLASS suggestion, the context switch code would:
What do other platforms do here? Assuming the encoding is known to be the same, is it always safe+IEEE compliant to fsd an FPR used to hold a single precision value, then later fld it back into the register file and treat it as a single-precision value?
@asb Consider what happens if procedure A is using fs0
for a float
and calls procedure B which also uses fs0
: procedure B spills fs0
to the stack using FSD
. Which means that there are now recoded values on the stack. Which means that your "heterogenous cluster migrator" now needs, at the very least, a DWARF unwinder to find all of the spill slots that might need rewriting, and most likely application knowledge as well (especially if the application is something like riscv/riscv-go that does "interesting" things with stacks).
I think migration is only going to happen between chips with compatible float formats, but I'm not really bothered by this since you already need to keep AT_HWCAP stuff the same.
@DSHorner That might work for Berkeley's 65-bit registers, but it doesn't work for spill slots halfway up the stack, and it won't work on implementations with 64-bit registers that store IEEE values directly (I suspect this will be not uncommon).
@sorear: Thanks, you're right that I was missing the bigger picture by just worrying about saving/restoring registers - fixing that is insufficient. Explicitly documenting that cores should use compatible float formats if they want to support process migration between them is one way to go - if this was done, the specification should suggest a preferred way so people without a reason to opt-out at least cluster around a common solution. There are still problems:
Two additional considerations for coverage: With the introduction of float16 (half precision) will there be a requirement that FSW also support the dual type guarantee? Applications must ensure they write external values in the correct format of the float type. The hazard introduced by embedded lesser formats is potentially high for the realized optimization, especially when there is no mitigation possible by directly checking/using an explicit type value.
As I now understand the problem, I think picking a standard encoding/serialisation is the sensible way forward. I've written up a summary document, and kicked of a discussion on isa-dev.
@asb Thank you so much for : I've written up a summary document, and kicked of a discussion on isa-dev.
This has been effectively resolved on the mailing list, but I will wait for @kasanovic's writeup to hit the repo before closing the issue.
NaN-boxing solution added to spec.
I believe the following statements in the spec are necessary but insufficient:
FSD and FMV.X.D should be defined to create the same implementation-defined values as each other, and FLD and FMV.D.X should restore them equivalently. In particular, FSD followed by LD and FMV.D.X should properly recreate the single-precision value, as should FMV.X.D followed by SD and FLD.