Closed regexident closed 6 years ago
Is this actually useful to you for integers? Otherwise, you can just use Float::mul_add
.
Is this actually useful to you for integers? Otherwise, you can just use
Float::mul_add
.
It's useful for writing code that makes use of the optimization if applicable, yet remains fully generic. A particular use case is fixed-point arithmetic on T: Real + MulAdd
as a convenient drop-in replacement for T: Float + MulAdd
.
I'm currently on and off working on making japaric/fpa usable as a proper replacement for Float
on embedded/no_std
without FPU.
I'd like to be able to write algebraic code that makes full use of all available optimized code paths if applicable, yet remains code-compatible with environments if reduced sophistication.
his produces a more accurate result with better performance than a separate multiplication operation followed by an add
Unfortunately, in my experience this is not necessarily true. For f64
I don't see a performance difference on my machine, and for f32
it is slower without target-cpu=native
and faster with.
Unfortunately, in my experience this is not necessarily true.
This does not invalidate the desire to have a way to generically express fn mul_add
through a trait though, or does it?
No, performance is not really related to the changes in this PR. What might be problematic is that a * b + c
is less precise than a.mul_add(b, c)
and will yield different results. I'm not sure how big of a problem that is in practice, but it should probably be documented.
Also see https://github.com/rust-lang/rust/pull/44805:
It's not just about exact results, it's also about reasoning about how inexact the result can get, and having particular behavior if an argument or the intermediate product is non-finite. For an example of the latter, consider fma(MAX_FLT, MAX_FLT, NEG_INFINITY) (evaluates to -inf) vs (MAX_FLT * MAX_FLT) + NEG_INFINITY (evaluates to NaN).
No, performance is not really related to the changes in this PR. What might be problematic is that
a * b + c
is less precise thana.mul_add(b, c)
and will yield different results. I'm not sure how big of a problem that is in practice, but it should probably be documented.
Good point, I'll gladly add a mention of this to the documentation. :)
Any further feedback? What needs to be done to proceed? Do we want to proceed?
Is this actually useful to you for integers? Otherwise, you can just use
Float::mul_add
.It's useful for writing code that makes use of the optimization if applicable, yet remains fully generic. A particular use case is fixed-point arithmetic on
T: Real + MulAdd
as a convenient drop-in replacement forT: Float + MulAdd
.
Note there's Real::mul_add
too.
Supporting the no_std
fpa crate is a more compelling example though, since Float
and Real
are only available in std
builds.
I still think we should not have a (self * a) + b
fallback for no_std
floats -- I'd rather not implement the trait for no_std
floats at all if we can't meet the same rounding accuracy. The performance is secondary, maybe not even worth bringing up.
I just moved the impls of MulAdd
/MulAddAssign
for f32
/f64
behind the #[cfg(feature = "std")]
feature guard with commit 28be885.
bors r+
Both
f32
andf64
implement fused multiply-add, which computes(self * a) + b
with only one rounding error. This produces a more accurate result with better performance than a separate multiplication operation followed by an add:It is however not possible to make use of this in a generic context by abstracting over a trait.
My concrete use-case is machine learning, gradient descent to be specific,
where the core operation of updating the gradient could make use of
mul_add
for both itsweights: Vector
as well as itsbias: f32
:(The actual impl of
Vector
would be generic over its value type:Vector<T>
, thus requiring the trait.)