Tracking Issue for fNN::lerp

clarfonthey commented 3 years ago

Feature gate: #![feature(float_interpolationl)]

This is a tracking issue for the fNN::lerp methods in the standard library.

Public API

impl fNN {
    fn lerp(self, start: Self, end: Self) -> Self;
}

Steps / History

[x] Earlier feature request: #71015
[x] Earlier (closed) implementation: #71016
[x] Implementation: #85925
[ ] Final commenting period (FCP)
[ ] Stabilization PR

Unresolved Questions

Should we have all of the guarantees of std::lerp from C++?
What specific mathematical version of lerp should we use?
Is the calling convention good? Normally the order is lerp(a, b, t) but we have t.lerp(a, b)` here.
Should we have an inverse lerp method? What should it be called?

leonardo-m commented 3 years ago

Considering this is isn't C++, don't you want to give it a more descriptive name, like "linear_interpolation"?

camsteffen commented 3 years ago

Considering this is isn't C++, don't you want to give it a more descriptive name, like "linear_interpolation"?

In this particular case I think it is good to copy the name. It is associated with this specific formula and the convention is used in other libraries and languages.

I think it is better to not have a self parameter for the same reason min/max functions do not have a self parameter. None of the inputs are self-evidently the "primary" input. a and b are a pair and t is the "other" parameter, but that does not make t "primary". Perhaps you may conceptualize the formula in a way that t feels primary, but that is not the most natural/common way to conceptualize it IMO. I think it is more natural to think of the inputs in the order a, b, t. "Start. End. Position between the two."

leonardo-m commented 3 years ago

In Rust you can write x.min(y).

camsteffen commented 3 years ago

In Rust you can write x.min(y).

Huh TIL 👆

clarfonthey commented 3 years ago

Yeah, lerp is a standard name for this, like signum, min, or max. We don't write out minimum or maximum for those methods. ;)

CAD97 commented 3 years ago

Previously on irlo

Should we have all of the guarantees of std::lerp from C++?

Given that a significant part of the advantage to lerp is that it's super fast and trivial, it'll be interesting to sell using an implementation other than the two standard implementations (v0 + t * ( v1 - v0 ) (monotonic) or ( 1 - t ) * v0 + t * v1 (exact for t=1)). Especially on e.g. the GPU, being fast and branchless might be prioritized over being exact, consistent, and monotonic. Exact and consistent are commonly selected for, but monotonicity is rarely considered key, given lerp is typically (in my domain, games) used primarily for visual output, where small deviation doesn't matter much. That said, the given implementation of FMA( FMA( v0, v0, -t ), t, v1 ) (that is, (v0 + (v0 * -t)) + (t * v1)) is promising, as it's basically equivalent to the second, but...

I'm only 90% certain that it's monotonic, but I'm sure that people who care deeply about this will be there to discuss before stabilisation.

It's just f32, so we can check a full series for a given min/max, and... it's not monotonic. As just one example,

Not monotonic: lerp(0.24997729, 0.27129978, 0.9048672) = 0.42967725
             > lerp(0.2499773 , 0.27129978, 0.9048672) = 0.42967722

(Min and max were just chosen randomly here.) My code found 978813 pairs of adjacent float ts that were not monotonic. But that said, I don't think we should care about monotonicity. What actually potentially matters, in my eyes at least, is boundedness; that is, that for any (finite) a, b, t, a <= t.lerp(a, b) && t.lerp(a, b) <= b (flipped if b < a). As far as I've found, this implementation does uphold that requirement.

Is the calling convention good?

According to Wikipedia at least, both (v0, v1, t) and (t, v0, v1) are in use. For method use, t.lerp_between(a, b) seems unambiguous to me, but f32::lerp_between(t, a, b) is ambiguous again. A small survey of existing math/linalg crates' implementations (collected from mint rdeps):

nalgebra: Vector::lerp(&self, rhs: &Self, t: T) = t * rhs + (1 - t) * self;
euclid: Length::lerp(self, other: Self, t: T) = (1 - t) * self + t * other;
glam: vecn::lerp(self, other: vecn, s: f32) = self + ((other - self) * s);
cgmath: VectorSpace::lerp(self, other: Self, amount: Scalar) = self + ((other - self) * amount);
kurbo: Vec2::lerp(self, other: Self, t: f64) = self + t * (other - self);
emath: lerp(range: RangeInclusive<R>, t: T) = (1 - t) * range.start + t * range.end;
ultraviolet: Lerp::lerp(&self, end: Self, t: T) = self * (1 - t) + end * t;
vek: provides both Lerp::lerp_unclamped(from: Self, to: Self, factor: Factor) = mul_add(factor, to - from, from); ("using the supposedly fastest but less precise implementation") and Lerp::lerp_unclamped_precise(from: Self, to: Self, factor: Factor) = from * (1 - factor) + to * factor; ("using a possibly slower but more precise operation").

emath has a great solution to use here: RangeInclusive!

That is, use f32::lerp(self, range: RangeInclusive<f32>), so it's called as t.lerp(min..=max) or f32::lerp(t, min..=max). No confusion necessary!

Should we have an inverse lerp method? What should it be called?

As far as I can tell, the only somewhat common name is "inverse lerp" or "linearstep" (in contrast to smoothstep).

CAD97 commented 3 years ago

An alternative option is to add lerp to RangeInclusive<fNN> instead of fNN, so that it's RangeInclusive::lerp(min..=max, t) or (min..=max).lerp(t). This would be interesting, but I think potentially problematic for the core/std split? As well as much more difficult to discover.

koalefant commented 3 years ago

The current choice of lerp signature t.lerp(a, b) appears to be unfortunate for two reasons:

First, it is impossible to be consistent with f32 implementation while implementing lerp for custom alebraic types (such as Vec3 or Quat etc.). One would not be able to extend f32 in an another crate to add interpolation of additional types.

koalefant commented 3 years ago

Second, it is inconsistent with most of existing algebraic crates in the ecosystem. As @CAD97 has mentioned above: nalgebra, euclid, glam, cgmath, kurbo, ultraviolet and vek all are using a.lerp(b, t) signature. That is a lot of breaking changes to be done to stay consistent with new std implementation.

clarfonthey commented 3 years ago

Note that there is the unfortunate but possible implementation of requiring a function call style, e.g. f32::lerp(a, b, t), which would not be equivalent to those other implementations but would be much less breaking.

koalefant commented 3 years ago

Note that there is the unfortunate but possible implementation of requiring a function call style, e.g. f32::lerp(a, b, t), which would not be equivalent to those other implementations but would be much less breaking.

One downside is that it would require type names to be stated explicitly, which is not the case with many other methods such as as clamp() or min()/max(). And this only addresses the first issue.

CAD97 commented 3 years ago

@koalefant

That is an angle I hadn't realized, and consistent even for using RangeInclusive. v0.lerp(v1, t) can be applied to any custom lerpable type, as they can always implement their own methods. If we want the ecosystem to be able to also use t.lerp(v0, v1), though, we need to use a Lerp trait as an extension point, roughly

trait Lerp {
    fn lerp32(t: f32, v0: Self, v1: Self) -> Self { Self::lerp64(t, v0, v1) };
    fn lerp64(t: f64, v0: Self, v1: Self) -> Self;
}

impl f32 {
    fn lerp<L: Lerp>(self, v0: L, v1: L) -> L { L::lerp32(self, v0, v1) }
}

impl f64 {
    fn lerp<L: Lerp>(self, v0: L, v1: L) -> L { L::lerp64(self, v0, v1) }
}

and it's starting to get kind of messy, due to how overloading works in Rust. Assuming we only lerp between two values of the same type, we effectively want to do double dispatch, on the bitwidth of t and (more importantly) the type of the value being interpolated.

That, at least, speaks to putting the dispatch on the interpolated type. This brings me back to the possibility of providing it as (v0..=v1).lerp(t), as downstream can add trait implementations to add new lerp to different concretizations of RangeInclusive. (Though it makes me wish more that range types were Copy + IntoIterator instead of Iterator... but w/e that ship has sailed long ago.)

Although, on the other hand, the intricacy here makes me wonder whether lerp (for scalars) should live in num, for the same reason num was extracted from the stdlib to begin with.

koalefant commented 3 years ago

@koalefant

trait Lerp {
    fn lerp32(t: f32, v0: Self, v1: Self) -> Self { Self::lerp64(t, v0, v1) };
    fn lerp64(t: f64, v0: Self, v1: Self) -> Self;
}

impl f32 {
    fn lerp<L: Lerp>(self, v0: L, v1: L) -> L { L::lerp32(self, v0, v1) }
}

impl f64 {
    fn lerp<L: Lerp>(self, v0: L, v1: L) -> L { L::lerp64(self, v0, v1) }
}

I feel like it is a smart, but complicated solution to this problem.

koalefant commented 3 years ago

Although, on the other hand, the intricacy here makes me wonder whether lerp (for scalars) should live in num, for the same reason num was extracted from the stdlib to begin with.

Consider that some implementations of lerp are not identical to f32. For example, Quaternions are being normalized in glam after interpolation:

    fn lerp(self, end: Self, s: T) -> Self {
        // ...
        interpolated.normalize()
    }

CAD97 commented 3 years ago

(GitHub issues are not a great medium for in-depth back-and-forth: consider discussing in the zulip instead.)

Here are a few notes about guarantees. For finite inputs, we would like to provide guarantees that the function is

exact: 0.0.lerp(v0, v1) == v0 and 1.0.lerp(v0, v1) == v1
monotonic: cmp(v0, v1) == cmp(t.lerp(v0, v1), nextfloat(t).lerp(v0, v1))
consistent: t.lerp(v, v) == v
bounded: for t in 0.0..=1.0, t.lerp(v0, v1) in min(v0, v1)..=max(v0, v1)

Note that bounded implies consistent (while t in 0.0..=1.0), and that an implementation being bounded and/or consistent falls out of an implementation being both exact and monotonic, but if an implementation is not exact and/or not monotonic, it is better for it to still be bounded and/or consistent.

Separately, an implementation is said to be clamped if it returns a value in min(v0, v1)..=max(v0, v1) no matter the t (it logically clamps t into 0.0..=1.0). An implementation is unclamped if it allows t values outside 0.0..=1.0 to extrapolate outside of that range.

For each potential implementation, I'll provide the x64 assembly as generated by godbolt compiler explorer for both lerp and for a lerp between two known, increasing prime values, as an eyeball of relative performance characteristics, both with the default CPU target and with target-cpu=skylake (which allows inlining FMA). I think using relaxed math and allowing the compiler to provide the FMA in optimization might be better, depending on exact guarantees and such.

The simplest implementation (and the one that many gamedevs will reach for initially) is monotonic and consistent, but not quite exact nor bounded.

pub fn lerp(t: f32, v0: f32, v1: f32) -> f32 {
    t.mul_add(v1 - v0, v0)
}

ASM

```asm ; -O example::lerp: movaps xmm3, xmm1 subss xmm2, xmm1 movaps xmm1, xmm2 movaps xmm2, xmm3 jmp qword ptr [rip + fmaf@GOTPCREL] example::lerp_known: movss xmm1, dword ptr [rip + .LCPI1_0] movss xmm2, dword ptr [rip + .LCPI1_1] jmp qword ptr [rip + fmaf@GOTPCREL] ; -O -target-cpu=skylake example::lerp: vsubss xmm2, xmm2, xmm1 vfmadd213ss xmm0, xmm2, xmm1 ret example::lerp_known: vmovss xmm1, dword ptr [rip + .LCPI1_0] vfmadd213ss xmm0, xmm1, dword ptr [rip + .LCPI1_1] ret ```

This is monotonic because as t increases, the result of the FMA strictly increases. It's consistent, because when v0 == v1, v1 - v0 == 0.0, so the FMA will just return v0. This is not exact (and similarly not bounded) because (v1 - v0) + v0 may not necessarily equal v1 for values of significantly different enough scale, and will even overflow if v0 and v1 are of largest exponent and opposite signs. (IIUC, it could vary by up to ±1 ULP if the subtraction and addition round in the same direction, if no overflow happens.)

This might be enough if we say that lerp is not exact but is accurate to ±1 ULP rather than ±½ ULP (and document the overflow condition), assuming my understanding is correct.

While it does not recover the exact quality, it's rather trivial to recover boundedness for a clamped lerp, by clamping to the range of v0, v1, or even just clamping v1.

pub fn lerp(t: f32, v0: f32, v1: f32) -> f32 {
    // monotonic, consistent
    let naive = t.mul_add(v1 - v0, v0);
    // bounded
    if v0 <= v1 {
        naive.clamp(v0, v1)
    } else {
        naive.clamp(v1, v0)
    }
}

ASM

```asm ; -O example::lerp: push rax movss dword ptr [rsp + 4], xmm2 movaps xmm3, xmm1 movss dword ptr [rsp], xmm1 movaps xmm1, xmm2 subss xmm1, xmm3 movaps xmm2, xmm3 call qword ptr [rip + fmaf@GOTPCREL] movss xmm2, dword ptr [rsp + 4] movss xmm1, dword ptr [rsp] ucomiss xmm2, xmm1 jae .LBB0_3 ucomiss xmm1, xmm2 jb .LBB0_5 maxss xmm2, xmm0 minss xmm1, xmm2 jmp .LBB0_4 .LBB0_3: maxss xmm1, xmm0 minss xmm2, xmm1 movaps xmm1, xmm2 .LBB0_4: movaps xmm0, xmm1 pop rax ret .LBB0_5: lea rdi, [rip + .L__unnamed_1] lea rdx, [rip + .L__unnamed_2] mov esi, 28 call qword ptr [rip + core::panicking::panic@GOTPCREL] ud2 example::lerp_known: push rax movss xmm1, dword ptr [rip + .LCPI1_0] movss xmm2, dword ptr [rip + .LCPI1_1] call qword ptr [rip + fmaf@GOTPCREL] movss xmm1, dword ptr [rip + .LCPI1_1] maxss xmm1, xmm0 movss xmm0, dword ptr [rip + .LCPI1_2] minss xmm0, xmm1 pop rax ret ; -O -target-cpu=skylake example::lerp: vsubss xmm3, xmm2, xmm1 vfmadd213ss xmm3, xmm0, xmm1 vucomiss xmm2, xmm1 jae .LBB0_3 vucomiss xmm1, xmm2 jb .LBB0_5 vmaxss xmm0, xmm2, xmm3 vminss xmm0, xmm1, xmm0 ret .LBB0_3: vmaxss xmm0, xmm1, xmm3 vminss xmm0, xmm2, xmm0 ret .LBB0_5: push rax lea rdi, [rip + .L__unnamed_1] lea rdx, [rip + .L__unnamed_2] mov esi, 28 call qword ptr [rip + core::panicking::panic@GOTPCREL] ud2 example::lerp_known: vmovss xmm1, dword ptr [rip + .LCPI1_0] vfmadd132ss xmm0, xmm1, dword ptr [rip + .LCPI1_1] vmaxss xmm0, xmm1, xmm0 vmovss xmm1, dword ptr [rip + .LCPI1_2] vminss xmm0, xmm1, xmm0 ret ```

This, if I understand correctly, fulfills all of the above defined properties for a clamped lerp, except that the returned value for t=1 may be 1 ULP short.

It's also worth noting that while the prior implementation contains no panicking code paths, this implementation still contains panicking code, I assume for some combination of nonfinite inputs.

This is the other simple implementation, and the one the other portion of gamedevs will reach for. I've written the simple definition you'll often see, but the current double FMA implementation is algebraically equivalent to it, just with (1-t)v0 factored to v0 + (v0)(-t).

pub fn lerp(t: f32, v0: f32, v1: f32) -> f32 {
    ((1.0 - t) * v0) + (t * v1)
}

ASM

Note that this doesn't use explicit FMA, but _could_ use at least one. ```asm ; -O example::lerp: movss xmm3, dword ptr [rip + .LCPI0_0] subss xmm3, xmm0 mulss xmm3, xmm1 mulss xmm2, xmm0 addss xmm3, xmm2 movaps xmm0, xmm3 ret example::lerp_known: movss xmm1, dword ptr [rip + .LCPI1_0] subss xmm1, xmm0 mulss xmm1, dword ptr [rip + .LCPI1_1] mulss xmm0, dword ptr [rip + .LCPI1_2] addss xmm0, xmm1 ret ; -O -target-cpu=skylake example::lerp: vmovss xmm3, dword ptr [rip + .LCPI0_0] vsubss xmm3, xmm3, xmm0 vmulss xmm1, xmm3, xmm1 vmulss xmm0, xmm0, xmm2 vaddss xmm0, xmm1, xmm0 ret example::lerp_known: vmovss xmm1, dword ptr [rip + .LCPI1_0] vsubss xmm1, xmm1, xmm0 vmulss xmm1, xmm1, dword ptr [rip + .LCPI1_1] vmulss xmm0, xmm0, dword ptr [rip + .LCPI1_2] vaddss xmm0, xmm0, xmm1 ret ```

This implementation is exact, but it is not monotonic nor consistent, nor even bounded.

Again, clamping recovers boundedness and consistency, so I've included that implementation as well for completeness.

pub fn lerp(t: f32, v0: f32, v1: f32) -> f32 {
    let naive = ((1.0 - t) * v0) + (t * v1);
    if v0 < v1 {
        naive.clamp(v0, v1)
    } else {
        naive.clamp(v1, v0)
    }
}

pub fn lerp_known(t: f32) -> f32 {
    lerp(t, 7.0, 13.0)
}

ASM

```asm ; -O example::lerp: push rax movss xmm3, dword ptr [rip + .LCPI0_0] subss xmm3, xmm0 mulss xmm3, xmm1 mulss xmm0, xmm2 addss xmm0, xmm3 ucomiss xmm2, xmm1 jbe .LBB0_1 jb .LBB0_6 maxss xmm1, xmm0 minss xmm2, xmm1 movaps xmm1, xmm2 jmp .LBB0_5 .LBB0_1: ucomiss xmm1, xmm2 jb .LBB0_6 maxss xmm2, xmm0 minss xmm1, xmm2 .LBB0_5: movaps xmm0, xmm1 pop rax ret .LBB0_6: lea rdi, [rip + .L__unnamed_1] lea rdx, [rip + .L__unnamed_2] mov esi, 28 call qword ptr [rip + core::panicking::panic@GOTPCREL] ud2 example::lerp_known: movss xmm2, dword ptr [rip + .LCPI1_0] subss xmm2, xmm0 movss xmm3, dword ptr [rip + .LCPI1_1] mulss xmm2, xmm3 movss xmm1, dword ptr [rip + .LCPI1_2] mulss xmm0, xmm1 addss xmm0, xmm2 maxss xmm3, xmm0 minss xmm1, xmm3 movaps xmm0, xmm1 ret ; -O -target-cpu=skylake example::lerp: push rax vmovss xmm3, dword ptr [rip + .LCPI0_0] vsubss xmm3, xmm3, xmm0 vmulss xmm3, xmm3, xmm1 vmulss xmm0, xmm0, xmm2 vaddss xmm0, xmm3, xmm0 vucomiss xmm2, xmm1 jbe .LBB0_1 jb .LBB0_6 vmaxss xmm0, xmm1, xmm0 vminss xmm0, xmm2, xmm0 pop rax ret .LBB0_1: vucomiss xmm1, xmm2 jb .LBB0_6 vmaxss xmm0, xmm2, xmm0 vminss xmm0, xmm1, xmm0 pop rax ret .LBB0_6: lea rdi, [rip + .L__unnamed_1] lea rdx, [rip + .L__unnamed_2] mov esi, 28 call qword ptr [rip + core::panicking::panic@GOTPCREL] ud2 example::lerp_known: vmovss xmm1, dword ptr [rip + .LCPI1_0] vsubss xmm1, xmm1, xmm0 vmovss xmm2, dword ptr [rip + .LCPI1_1] vmulss xmm1, xmm1, xmm2 vmovss xmm3, dword ptr [rip + .LCPI1_2] vmulss xmm0, xmm0, xmm3 vaddss xmm0, xmm0, xmm1 vmaxss xmm0, xmm2, xmm0 vminss xmm0, xmm3, xmm0 ret ```

This implementation also contains panicking control flow, presumably for some combination of nonfinite inputs.

This is the last implementation that I've personally seen in use, though I know some others exist. This adds a consistency check to our exact implementation, but still fails at being monotonic or bounded.

pub fn lerp(t: f32, v0: f32, v1: f32) -> f32 {
    let naive = ((1.0 - t) * v0) + (t * v1);
    if v0 == v1 {
        v0
    } else {
        naive
    }
}

ASM

```asm ; -O example::lerp: movss xmm3, dword ptr [rip + .LCPI0_0] subss xmm3, xmm0 mulss xmm3, xmm1 mulss xmm0, xmm2 addss xmm0, xmm3 cmpeqss xmm2, xmm1 movaps xmm3, xmm2 andnps xmm3, xmm0 andps xmm2, xmm1 orps xmm2, xmm3 movaps xmm0, xmm2 ret example::lerp_known: movss xmm1, dword ptr [rip + .LCPI1_0] subss xmm1, xmm0 mulss xmm1, dword ptr [rip + .LCPI1_1] mulss xmm0, dword ptr [rip + .LCPI1_2] addss xmm0, xmm1 ret ; -O -target-cpu=skylake example::lerp: vmovss xmm3, dword ptr [rip + .LCPI0_0] vsubss xmm3, xmm3, xmm0 vmulss xmm3, xmm3, xmm1 vmulss xmm0, xmm0, xmm2 vaddss xmm0, xmm3, xmm0 vcmpeqss xmm2, xmm1, xmm2 vblendvps xmm0, xmm0, xmm1, xmm2 ret example::lerp_known: vmovss xmm1, dword ptr [rip + .LCPI1_0] vsubss xmm1, xmm1, xmm0 vmulss xmm1, xmm1, dword ptr [rip + .LCPI1_1] vmulss xmm0, xmm0, dword ptr [rip + .LCPI1_2] vaddss xmm0, xmm0, xmm1 ret ```

This is the one I currently use, though I might reconsider after doing this analysis. Note also that this doesn't actually contain any branching assembly, just a cmov, despite having a branch in the source.

And finally, the proposed implementation from C++ P0811R3. I make no assertion whether it does or doesn't meet the promised properties, just reproduce it here.

pub fn lerp(t: f32, v0: f32, v1: f32) -> f32 {
    // exact, monotonic, finite, determinate, and (for v0=v1=0) consistent:
    if (v0 <= 0.0 && v1 >= 0.0) || (v0 >= 0.0 && v1 <= 0.0) {
        return t * v1 + (1.0-t) * v0;
    }
    // exact
    if t == 1.0 {
        return v1;
    }
    // exact at t=0, monotonic except near t=1,
    // bounded, determinate, and consistent:
    let x = v0 + t * (v1 - v0);
    // monotonic near t=1
    if t > 1.0 && v1 > v0 {
        f32::max(v1, x)
    } else {
        f32::min(v1, x)
    }
}

ASM

```asm ; -O example::lerp: xorps xmm3, xmm3 ucomiss xmm3, xmm1 jb .LBB0_2 ucomiss xmm2, xmm3 jae .LBB0_9 .LBB0_2: ucomiss xmm1, xmm3 jb .LBB0_4 xorps xmm3, xmm3 ucomiss xmm3, xmm2 jb .LBB0_4 .LBB0_9: mulss xmm2, xmm0 movss xmm3, dword ptr [rip + .LCPI0_0] subss xmm3, xmm0 mulss xmm3, xmm1 addss xmm2, xmm3 .LBB0_10: movaps xmm0, xmm2 ret .LBB0_4: ucomiss xmm0, dword ptr [rip + .LCPI0_0] jne .LBB0_5 jnp .LBB0_10 .LBB0_5: movaps xmm3, xmm2 subss xmm3, xmm1 mulss xmm3, xmm0 addss xmm3, xmm1 ucomiss xmm0, dword ptr [rip + .LCPI0_0] jbe .LBB0_7 ucomiss xmm2, xmm1 jbe .LBB0_7 movaps xmm0, xmm2 cmpunordss xmm0, xmm2 movaps xmm1, xmm0 andps xmm1, xmm3 maxss xmm3, xmm2 jmp .LBB0_8 .LBB0_7: movaps xmm0, xmm2 cmpunordss xmm0, xmm2 movaps xmm1, xmm0 andps xmm1, xmm3 minss xmm3, xmm2 .LBB0_8: andnps xmm0, xmm3 orps xmm0, xmm1 ret example::lerp_known: ucomiss xmm0, dword ptr [rip + .LCPI1_0] jne .LBB1_1 jp .LBB1_1 movss xmm0, dword ptr [rip + .LCPI1_3] ret .LBB1_1: ucomiss xmm0, dword ptr [rip + .LCPI1_0] mulss xmm0, dword ptr [rip + .LCPI1_1] addss xmm0, dword ptr [rip + .LCPI1_2] jbe .LBB1_4 maxss xmm0, dword ptr [rip + .LCPI1_3] ret .LBB1_4: minss xmm0, dword ptr [rip + .LCPI1_3] ret ; -O -target-cpu=skylake example::lerp: vxorps xmm3, xmm3, xmm3 vucomiss xmm3, xmm1 jb .LBB0_2 vucomiss xmm2, xmm3 jae .LBB0_9 .LBB0_2: vucomiss xmm1, xmm3 jb .LBB0_4 vxorps xmm3, xmm3, xmm3 vucomiss xmm3, xmm2 jb .LBB0_4 .LBB0_9: vmulss xmm2, xmm0, xmm2 vmovss xmm3, dword ptr [rip + .LCPI0_0] vsubss xmm0, xmm3, xmm0 vmulss xmm0, xmm0, xmm1 vaddss xmm2, xmm0, xmm2 .LBB0_10: vmovaps xmm0, xmm2 ret .LBB0_4: vucomiss xmm0, dword ptr [rip + .LCPI0_0] jne .LBB0_5 jnp .LBB0_10 .LBB0_5: vsubss xmm3, xmm2, xmm1 vmulss xmm3, xmm3, xmm0 vaddss xmm3, xmm3, xmm1 vucomiss xmm0, dword ptr [rip + .LCPI0_0] jbe .LBB0_7 vucomiss xmm2, xmm1 jbe .LBB0_7 vmaxss xmm0, xmm3, xmm2 jmp .LBB0_8 .LBB0_7: vminss xmm0, xmm3, xmm2 .LBB0_8: vcmpunordss xmm1, xmm2, xmm2 vblendvps xmm0, xmm0, xmm3, xmm1 ret example::lerp_known: vucomiss xmm0, dword ptr [rip + .LCPI1_0] jne .LBB1_1 jp .LBB1_1 vmovss xmm0, dword ptr [rip + .LCPI1_3] ret .LBB1_1: vmulss xmm1, xmm0, dword ptr [rip + .LCPI1_1] vaddss xmm1, xmm1, dword ptr [rip + .LCPI1_2] vucomiss xmm0, dword ptr [rip + .LCPI1_0] jbe .LBB1_4 vmaxss xmm0, xmm1, dword ptr [rip + .LCPI1_3] ret .LBB1_4: vminss xmm0, xmm1, dword ptr [rip + .LCPI1_3] ret ```

The fact that this is so branchy, even for constant v0 and v1, means that (to a first order of approximation) no gamedev is ever going to use this implementation of lerp.

I think it would be a useful exercise to dig up what (if any) guarantees GPU shaders give on their implementation of lerp/mix, as well as hardware barycentric interpolations within triangles. Unfortunately I wasn't able to find any documentation on this, though I'm not familiar enough with GPU/shader documentation to realistically find such information.

koalefant commented 3 years ago

I think it would be a useful exercise to dig up what (if any) guarantees GPU shaders give on their implementation of lerp/mix, as well as hardware barycentric interpolations within triangles. Unfortunately I wasn't able to find any documentation on this, though I'm not familiar enough with GPU/shader documentation to realistically find such information.

Excerpts from lerp/mix used in shading languages:

Khronos documentation on GLSL mix: https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/mix.xhtml

mix performs a linear interpolation between x and y using a to weight between them. The return value is computed as x×(1−a)+y×a.

NVIDIA Cg documentation on lerp: https://developer.download.nvidia.com/cg/lerp.html

Returns the linear interpolation of a and b based on weight w.

a and b are either both scalars or both vectors of the same length. The weight w may be a scalar or a vector of the same length as a and b. w can be any value (so is not restricted to be between zero and one); if w has values outside the [0,1] range, it > actually extrapolates.

lerp returns a when w is zero and returns b when w is one.

Microsoft HLSL documentation on lerp: https://docs.microsoft.com/is-is/windows/win32/direct3dhlsl/dx-graphics-hlsl-lerp

Linear interpolation is based on the following formula: x(1-s) + ys which can equivalently be written as x + s(y-x).

WGSL on mix: https://www.w3.org/TR/WGSL/

Returns the linear blend of e1 and e2 (e.g. e1(1-e3)+e2e3). Component-wise with T is a vector.

Note that lerp does not do additional clamping there and can be used to extrapolate values.

koalefant commented 3 years ago

lerp in rust-gpu: https://embarkstudios.github.io/rust-gpu/api/glam/f32/struct.Vec3.html#method.lerp

Performs a linear interpolation between self and other based on the value s.

When s is 0.0, the result will be equal to self. When s is 1.0, the result will be equal to other.

Note that they also use a.lerp(b, t) signature there.

clarfonthey commented 3 years ago

In terms of the calling convention, I personally feel very strongly about not splitting the bounds between self and the method arguments, but I'm not the only person in the rust community and I think that if a reasonable number of people are concerned about breakage that it's worth opening a PR to change it.

bstrie commented 3 years ago

@clarfonthey Just wanted to make sure you're aware that a good amount of discussion has also happened on Zulip at https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/lerp.20API.20design/near/244050240 , if you'd like to also weigh in there.

halvnykterist commented 3 years ago

The calling convention for this bit me, I've never encountered a library where lerp was (t, a, b) rather than (a, b, t), and (foolishly) assumed that the warning for there being a new function that collides with my own lerp interpolation on f32 I'd just be able to swap them right over. I'm ambivalent on whether this should be implemented on ranges rather than floats (or even both), but please make the calling convention. I note that in the small survey of rust crates not a single one of them has t first.

CAD97 commented 3 years ago

interpolate( t, v0, v1 ) seems to be more common in more mathy applications, where lerp( v0, v1, t ) seems to be more common in more gamey applications.

Right now, based on the discussion in Zulip, my current feeling is that the next task here is to open a community RFC to determine

1) what properties of lerp are most desired to uphold for different applications, and which ones it's practical for std to provide; 2) from that, what implementation of lerp should be considered as the "standard" lerp; and 3) how a std lerp should be provided, such that not only f32/f64 are available to be lerped, but also f32x4 and na::DMatrix, and that those can be used as the t value in interpolation for wide enough types as well, with a compatible signature.

As a stretch goal, it would be nice if the signature could also be easily extended to support higher-order bezier interpolation (but that's a low order desire). (To be clear, I know how it can be done, but it's a question of supporting lerp first.)

koalefant commented 3 years ago

The calling convention for this bit me, I've never encountered a library where lerp was (t, a, b) rather than (a, b, t), and (foolishly) assumed that the warning for there being a new function that collides with my own lerp interpolation on f32 I'd just be able to swap them right over. I'm ambivalent on whether this should be implemented on ranges rather than floats (or even both), but please make the calling convention. I note that in the small survey of rust crates not a single one of them has t first.

That was exactly my experience: first reaction: sweet, there is now standard lerp coming. Second: I have to reorder my lerp arguments, but only for f32. 🤦

programmerjake commented 3 years ago

As a stretch goal, it would be nice if the signature could also be easily extended to support higher-order bezier interpolation (but that's a low order desire).

pub trait Lerp<R = Self, const N: usize = 2> {
    type Output;
    fn lerp(self, control_points: [R; N]) -> Self::Output;
}

impl Lerp for f32 {
    type Output = f32;
    fn lerp(self, control_points: [f32; 2]) -> f32 {
        (1.0 - self) * control_points[0] + self * control_points[1]
    }
}

impl Lerp<f32, 3> for f32 {
    type Output = f32;
    /// quadratic bezier curve -- a lerp of lerps
    fn lerp(self, control_points: [f32; 3]) -> f32 {
        let nt = 1.0 - self;
        nt * nt * control_points[0] + nt * self * 2.0 * control_points[1] + self * self * control_points[2]
    }
}

bstrie commented 3 years ago

The calling convention for this bit me, I've never encountered a library where lerp was (t, a, b) rather than (a, b, t), and (foolishly) assumed that the warning for there being a new function that collides with my own lerp interpolation on f32 I'd just be able to swap them right over. I'm ambivalent on whether this should be implemented on ranges rather than floats (or even both), but please make the calling convention. I note that in the small survey of rust crates not a single one of them has t first.

Any API for this in std should balance user expectation with Rust idioms. I'm sympathetic to the familiarity of a, b, t from other languages. At the same time, a.lerp(b, t) reads like an abuse of method syntax. If it must be a, b, t, and if it isn't (a..b).lerp(t), then make this an associated function rather than a method: f32::lerp(a, b, t).

Also note that Rust isn't exactly a stranger to adapting APIs to make them more Rust-like, even at the expense of familiarity: see how ptr::copy_nonoverlapping deliberately has its arguments swapped compared to its C equivalent memcpy, prioritizing internal consistency over cross-language consistency.

programmerjake commented 3 years ago

Well, if you want t to be last, how about:

pub trait Lerp<T> {
    type Output;
    fn lerp(self, t: T) -> Self::Output;
}

impl Lerp<f32> for [f32; 2] {
    type Output = f32;
    fn lerp(self, t: f32) -> Self::Output {
        (1.0 - t) * self[0] + t * self[1]
    }
}

impl Lerp<f32> for [f32; 3] {
    type Output = f32;
    /// quadratic bezier curve -- a lerp of lerps
    fn lerp(self, t: f32) -> Self::Output {
        let nt = 1.0 - t;
        nt * nt * self[0] + nt * t * 2.0 * self[1] + t * t * self[2]
    }
}

Example use:

assert_eq!([4.0f32, 8.0].lerp(0.75f32), 7.0);

halvnykterist commented 3 years ago

At the same time, a.lerp(b, t) reads like an abuse of method syntax.

I use a.lerp(b, t) in several places, it feels perfectly natural to me. Read as "lerp A towards B by T".

There are other methods on f32 in the standard library that might feel similarly "weird" to you, but are still not associated functions: y.atan2(x) a.min(b) a.max(b) x.hypot(y) If you don't like the look of them you can always just call them using associated function syntax.

And what I am asking for here is not (just) familiarity from other languages, but the existing rust ecosystem.

bstrie commented 3 years ago

Functions like a.min(b) are part of my point, as they are widely derided. That you can use UFCS to call them as f32::min(a, b) is true, but also highly obscure; I have had to explain this feature to many an experienced Rust user who came to me griping about a.min(b). The rationalization for the existence of a.min(b) is 1) we want a trait, so that people can bound on T: Ord, and because of how methods work we have to pick something to be the receiver, 2) there are only two arguments, so one of them has to be the receiver, and 3) even though Ord isn't implemented on floats, we want the interface to be consistent with integers, so we implement this as an inherent method.

But I don't see that these same rationalizations hold for lerp: 1) although people in here have proposed traits, the original PR just adds these as inherent methods on floats, 2) there are more than two arguments, so we don't need to split the receiver across the boundaries of a conceptual range (yuck); we can make the third argument the receiver (which users find unfamiliar) or we can coalesce the boundary arguments into a range, tuple, or array (which I see no arguments against yet), 3) there is no consistency to be be gained from making these inherent methods on floats since there is no other equivalent method in std that we are trying to emulate; they can easily just be associated functions.

halvnykterist commented 3 years ago

Functions like a.min(b) are part of my point, as they are widely derided.

They are? If that's the case, shouldn't the examples in the docs at least be calling them as f32::min(a, b) then?

The rationalization for the existence of a.min(b) is 1) we want a trait, so that people can bound on T: Ord, and because of how methods work we have to pick something to be the receiver, 2) there are only two arguments, so one of them has to be the receiver, and 3) even though Ord isn't implemented on floats, we want the interface to be consistent with integers, so we implement this as an inherent method.

1 and 3 here don't hold for f32::atan2 and f32::hypot.

...or we can coalesce the boundary arguments into a range, tuple, or array (which I see no arguments against yet),..

My argument against this would be that you're introducing complexity where it doesn't need to exist, and would add noise to call sites. Granted, not a strong argument, but the benefits aren't particularly strong either.

koalefant commented 3 years ago

we can make the third argument the receiver

If lerp becomes a method of t it becomes impossible (or at least difficult) to overload it for other algebraic types such as vectors or quaternions.

we can coalesce the boundary arguments into a range, tuple, or array (which I see no arguments against yet),

All of these will require construction of useless temporary objects.

there is no consistency to be be gained from making these inherent methods on floats since there is no other equivalent method in std that we are trying to emulate; they can easily just be associated functions.

There is a number of existing algebraic crates (listed above) that use a.lerp(b, t) convention. A single std-function is unlikely to change this convention, this will only introduce fragmentation, misunderstanding and inconsistency into the ecosystem.

programmerjake commented 3 years ago

we can make the third argument the receiver

If lerp becomes a method of t it becomes impossible (or at least difficult) to overload it for other algebraic types such as vectors or quaternions.

It's simple with a Lerp trait:


pub trait Lerp<T = [Self; 2]>: Sized {
    type Output;
    fn lerp(self, control_points: T) -> Self::Output;
}

impl Lerp<[Vec3; 2]> for f32 {
    type Output = Vec3;
    fn lerp(self, control_points: [Vec3; 2]) -> Self::Output {
        (1.0 - self) * control_points[0] + self * control_points[1]
    }
}

programmerjake commented 3 years ago

we can coalesce the boundary arguments into a range, tuple, or array (which I see no arguments against yet),

All of these will require construction of useless temporary objects.

They're not useless, arrays allow easily being generic over lerp and higher-order Bezier Curves, while also making it extremely obvious which arguments are the endpoints and which argument is t. Just seeing a.lerp(b, c) you can't tell for sure if t is a or c without having to go see which specific definition the code's author is using today. For [a, b].lerp(c) or c.lerp([a, b]) it's completely obvious that any sane API designer would have t be c and [a, b] are the endpoints.

programmerjake commented 3 years ago

There is a number of existing algebraic crates (listed above) that use a.lerp(b, t) convention. A single std-function is unlikely to change this convention, this will only introduce fragmentation, misunderstanding and inconsistency into the ecosystem.

There is Rust code that uses lerp(t, a, b): A project I partially finished several years ago: https://github.com/programmerjake/hashlife3d/blob/15551958ebbfe3ac6e58508a8f1cde9d797a6e0a/src/block/mod.rs#L715

I'm probably far from the only one...

koalefant commented 3 years ago

we can make the third argument the receiver

If lerp becomes a method of t it becomes impossible (or at least difficult) to overload it for other algebraic types such as vectors or quaternions.

It's simple with a Lerp trait:
pub trait Lerp<T = [Self; 2]>: Sized {
    type Output;
    fn lerp(self, control_points: T) -> Self::Output;
}

impl Lerp<[Vec3; 2]> for f32 {
    type Output = Vec3;
    fn lerp(self, control_points: [Vec3; 2]) -> Self::Output {
        (1.0 - self) * control_points[0] + self * control_points[1]
    }
}

Use of array causes suboptimal code generation (both debug and optimized) even for f32. Here is an example: https://godbolt.org/z/zT7Pr1Pjb For larger types this trait is likely to cause two undesired copies.

programmerjake commented 3 years ago

Use of array causes suboptimal code generation (both debug and optimized) even for f32. Here is an example: https://godbolt.org/z/zT7Pr1Pjb

They produce identical code with optimizations on once the impls are decorated with #[inline] like they should be in std:

.set example::test_value, example::test_array

https://godbolt.org/z/o7esvxssa

If optimizations are turned off, passing arrays/separate-args to lerp is the least of your worries...

For larger types this trait is likely to cause two undesired copies.

That's inherent in having larger types passed across non-inlined function call boundaries -- caused by current rustc inheriting C's poor ABI decisions of not binpacking function arguments'/returns' fields in registers -- nothing specific to lerp.

programmerjake commented 3 years ago

additional crate that uses fn lerp(t, a, b): https://docs.rs/splines/4.0.0/splines/interpolate/trait.Interpolate.html#tymethod.lerp

Has nearly 67k downloads.

mockersf commented 3 years ago

Since 1.55 the following code doesn't compile anymore:

trait MyLerp {
    fn lerp(&self) -> f32;
}
impl MyLerp for f32 {
    fn lerp(&self) -> f32 {
        0.1
    }
}
fn main() {
    println!("{}", 0.5_f32.lerp());
}

Because of error[E0658]: use of unstable library feature 'float_interpolation'

Even though I can't enable the feature as I'm using stable.

How could I fix this on stable until this feature is stabilised?

Mark-Simulacrum commented 3 years ago

println!("{}", MyLerp::lerp(&0.5_f32)); instead of invoking it through .lerp() is one way.

clarfonthey commented 3 years ago

I thought that unstable library features wouldn't be preferred if you didn't have the features enabled?

CAD97 commented 3 years ago

That's if the two items are ambiguous / are both trait methods, IIRC. Here f32::lerp is an inherent method, so it's prioritized over trait methods, even as unstable, apparently.

If we aren't actively pursuing progress on deciding the correct implementation and API for a std lerp^[1], perhaps we should consider removing it for the time being to give the method space back. (I also didn't realize this was going to be an issue.)

[1] and as a reminder, I consider 3rd party math types being able to offer the same lerp API a hard requirement, because having differing APIs by requirements is awful

scottmcm commented 2 years ago

Although, on the other hand, the intricacy here makes me wonder whether lerp (for scalars) should live in num, for the same reason num was extracted from the stdlib to begin with.

Given the complexity here, this is where I think I lean right now. Especially since so much of the discussion is about "gamey" uses, where there's going to be another math library involved anyways for interpolation of vectors. That library can just provide it for scalars too (by an extension trait if it must). And maybe that's true for non-gamey uses too? Like perhaps those cases are all using ndarray or similar, which could provide this instead.

Then Rust can wait, and add one later should IEEE 754 decide to add it to a future version of their spec.

This might be enough if we say that lerp is not exact but is accurate to ±1 ULP rather than ±½ ULP (and document the overflow condition), assuming my understanding is correct.

FWIW there are many other floating-point methods which are also ±1 ULP, like tan, so that wouldn't be particularly out of place. Though I don't know that I'd expect this one to be worse than ±½ ULP.

CAD97 commented 2 years ago

RIP std lerp. Long live lerp 🦀

Ralith commented 2 years ago

Excerpts from lerp/mix used in shading languages:

Khronos documentation on GLSL mix

For posterity, this is actually a bit subtle than those docs seem to suggest. Per Precision and Operation of SPIR-V Instructions from the Vulkan spec:

Where an operation’s precision is described as being inherited from a formula, the result returned must be at least as accurate as the result of computing an approximation to x using a formula equivalent to the given formula applied to the supplied inputs. Specifically, the formula given may be transformed using the mathematical associativity, commutativity and distributivity of the operators involved to yield an equivalent formula.

(emphasis added)

This caused me considerable pain in an application where I absolutely required the "exact" property, but nVidia implemented mix as t * (v1 - v0) + v0, whereas my graphics debugger implemented it as ((1.0 - t) * v0) + (t * v1), which made it very difficult to track down!

orekhoff commented 5 months ago

Why not?:

trait Lerp {
    fn lerp(self, t: f32) -> f32;
}

impl Lerp for (f32, f32) {
    fn lerp(self, t: f32) -> f32 {
        self.0 + t * (self.1 - self.0)
    }
}

Usage: (a, b).lerp(t)

clarfonthey commented 5 months ago

This has been closed, so, it's unlikely this would be accepted without a new ACP. However, there's still the issue of a.lerp(b, t) vs. t.lerp(a, b) which is unlikely to be resolved at this point. We currently have t.clamp(a, b) as a stable API so stabilising a.lerp(b, t) seems like a bad idea.

Your suggestion has been made in various forms above in the discussion already, so, I would suggest reading through and seeing why the decision was made to simply close this.

I think that, if we were to push the existing ecosystem to adopt t.lerp(a, b) and resolve the issues about what guarantees are worthwhile in the standard library, we could maybe make an ACP with these updates and potentially get this in. But without that, it seems highly unlikely.

rust-lang / rust