Open zesterer opened 3 years ago
That reminds me of something. Very recently, for a reason I don't remember, I ended up reading the definition of FMA on Wikipedia, and I realized something important :
Strictly speaking, mul_add(a, b, c)
is not the same as a * b + c
. I used to believe that this was mostly a matter of performance, but in fact mul_add
provides an extra guarantee about the precision of the result which is enough to treat it as a different operation.
The important part I realized is, when a user calls mul_add()
, they expect the guarantees of a real FMA operation (at the hardware-level); because it they didn't, they would have written a * b + c
as usual.
So I need to check what core
's MulAdd
opinion is on this, but if my assumption is correct, MulAdd
does not make sense for types which do not have hardware support for FMA.
I'm not fundamentally against a generic MulAdd
, but I realized it may become a matter of "are we misinforming the user". I'll look into this.
Perhaps there could be a convenience type like struct MAdd<T>(T);
that adds a MulAdd
implementation in terms of Mul
and Add
to allow use with non-standard types?
Btw, since the fix for #73 that I've published just now, vek
just re-export's num-trait
's MulAdd
instead. I think it's for the better...
Both std
and num-traits
seem to agree on the following :
Computes (self * a) + b with only one rounding error, yielding a more accurate result than an unfused multiply-add. Using mul_add may be more performant than an unfused multiply-add if the target architecture has a dedicated fma CPU instruction
According to this, it looks like they guarantee the only one rounding error
part, what they don't guarantee is acceleration
.
I didn't take the time to dig into intrinsics::fmaf64()
to see what it really does, but as far as I can infer from this documentation (and that's probably what we should stick to), they would be willing to have a software implementation of FMA if the CPU doesn't have it; and that implementation is probably not a * b + c
since it may cause two rounding errors.
Though, looking back at your initial proposal, I wonder : is it only Mat4
that suffers from the issue?
Also, if the goal if for MulAdd
to work with non-standard types, then I would suggest poking at the authors of these types to make them implement num-trait
's MulAdd
, and it should just work, right ?
Thanks for making this switch! The reason I opened the issue is that I'm working on a platform without any real acceleration hardware or intrinsics at all, to falling back to a soft impl seems perfectly reasonable to me. I think this will now work fine.
MulAdd
is an internal trait but may be implemented in terms of traits incore
. The ability to relax this bound would allowMat4
to work with non-standard numeric types.