regexident commented 6 years ago

Both f32 and f64 implement fused multiply-add, which computes (self * a) + b with only one rounding error. This produces a more accurate result with better performance than a separate multiplication operation followed by an add:

fn mul_add(self, a: f32, b: f32) -> f32[src]

It is however not possible to make use of this in a generic context by abstracting over a trait.

My concrete use-case is machine learning, gradient descent to be specific,
where the core operation of updating the gradient could make use of mul_add for both its weights: Vector as well as its bias: f32:

struct Perceptron {
  weights: Vector,
  bias: f32,
}

impl MulAdd<f32, Self> for Vector {
  // ...
}

impl Perceptron {
  fn learn(&mut self, example: Vector, expected: f32, learning_rate: f32) {
    let alpha = self.error(example, expected, learning_rate);
    self.weights = example.mul_add(alpha, self.weights);
    self.bias = self.bias.mul_add(alpha, self.bias)
  }
}

(The actual impl of Vector would be generic over its value type: Vector<T>, thus requiring the trait.)

cuviper commented 6 years ago

Is this actually useful to you for integers? Otherwise, you can just use Float::mul_add.

regexident commented 6 years ago

Is this actually useful to you for integers? Otherwise, you can just use Float::mul_add.

It's useful for writing code that makes use of the optimization if applicable, yet remains fully generic. A particular use case is fixed-point arithmetic on T: Real + MulAdd as a convenient drop-in replacement for T: Float + MulAdd.

I'm currently on and off working on making japaric/fpa usable as a proper replacement for Float on embedded/no_std without FPU.

I'd like to be able to write algebraic code that makes full use of all available optimized code paths if applicable, yet remains code-compatible with environments if reduced sophistication.

vks commented 6 years ago

his produces a more accurate result with better performance than a separate multiplication operation followed by an add

Unfortunately, in my experience this is not necessarily true. For f64 I don't see a performance difference on my machine, and for f32 it is slower without target-cpu=native and faster with.

regexident commented 6 years ago

Unfortunately, in my experience this is not necessarily true.

This does not invalidate the desire to have a way to generically express fn mul_add through a trait though, or does it?

vks commented 6 years ago

No, performance is not really related to the changes in this PR. What might be problematic is that a * b + c is less precise than a.mul_add(b, c) and will yield different results. I'm not sure how big of a problem that is in practice, but it should probably be documented.

Also see https://github.com/rust-lang/rust/pull/44805:

It's not just about exact results, it's also about reasoning about how inexact the result can get, and having particular behavior if an argument or the intermediate product is non-finite. For an example of the latter, consider fma(MAX_FLT, MAX_FLT, NEG_INFINITY) (evaluates to -inf) vs (MAX_FLT * MAX_FLT) + NEG_INFINITY (evaluates to NaN).

regexident commented 6 years ago

No, performance is not really related to the changes in this PR. What might be problematic is that a * b + c is less precise than a.mul_add(b, c) and will yield different results. I'm not sure how big of a problem that is in practice, but it should probably be documented.

Good point, I'll gladly add a mention of this to the documentation. :)

Any further feedback? What needs to be done to proceed? Do we want to proceed?

cuviper commented 6 years ago

Is this actually useful to you for integers? Otherwise, you can just use Float::mul_add.

It's useful for writing code that makes use of the optimization if applicable, yet remains fully generic. A particular use case is fixed-point arithmetic on T: Real + MulAdd as a convenient drop-in replacement for T: Float + MulAdd.

Note there's Real::mul_add too.

Supporting the no_std fpa crate is a more compelling example though, since Float and Real are only available in std builds.

I still think we should not have a (self * a) + b fallback for no_std floats -- I'd rather not implement the trait for no_std floats at all if we can't meet the same rounding accuracy. The performance is secondary, maybe not even worth bringing up.

regexident commented 6 years ago

I just moved the impls of MulAdd/MulAddAssign for f32/f64 behind the #[cfg(feature = "std")] feature guard with commit 28be885.

cuviper commented 6 years ago

bors r+

bors[bot] commented 6 years ago

Build succeeded

continuous-integration/travis-ci/push

rust-num / num-traits

Added `MulAdd` and `MulAddAssign` traits #59

Build succeeded