Closed Finomnis closed 1 year ago
We can use unstable core intrinsics (I'd prefer a nightly
feature for that, though keeping up with nightly
changes is always a bit painful), but if the goal is primarily optimizing thumbv7em-none-eabihf
, it seems like inline assembly would be a stable alternative?
if the goal is primarily optimizing
thumbv7em-none-eabihf
, it seems like inline assembly would be a stable alternative?
Well yes, that is true, but by implementing it via intrinsics would make this compatible with all boards, not just this one. And inline assembly isn't optimizable via LLVM, so things like compile time evaluation wouldn't happen.
This is my current solution in a project:
pub trait F32Ext {
/// Sinus
fn sin(self) -> Self;
/// Cosinus
fn cos(self) -> Self;
}
#[no_mangle]
extern "C" fn sinf(val: f32) -> f32 {
micromath::F32(val).sin().0
}
#[no_mangle]
extern "C" fn cosf(val: f32) -> f32 {
micromath::F32(val).cos().0
}
impl F32Ext for f32 {
fn sin(self) -> f32 {
unsafe { core::intrinsics::sinf32(self) }
}
fn cos(self) -> f32 {
unsafe { core::intrinsics::cosf32(self) }
}
}
I thought surely I'm not the only person who came across this issue, so I thought maybe it's worth including something like this in a library :)
But if you think it's too much effort to maintain, I'll just keep it in my local project.
Well yes, that is true, but by implementing it via intrinsics would make this compatible with all boards, not just this one
Shouldn't the optimizations work on all boards with an thumbv7em-none-eabihf
MCU? I don't see why the assembly would vary by board.
And inline assembly isn't optimizable via LLVM, so things like compile time evaluation wouldn't happen.
While that's a minor drawback, it's one I think is outweighed by nightly
vs stable
. nightly
is a big maintenance burden because it's a moving target, which means things may break from nightly-release-to-release.
Shouldn't the optimizations work on all boards with an
thumbv7em-none-eabihf
MCU? I don't see why the assembly would vary by board.
I might have misworded it, I mean it's compatible with all targets then, not just thumbv7em-none-eabihf
. Basically every other target would then also automatically choose either a hardware implementation or micromath
as a fallback, instead of using the micromath
implementation by default. Even other micromath
functions that depend on each other would profit if some of the calls were hardware accelerated.
While that's a minor drawback, it's one I think is outweighed by
nightly
vsstable
.nightly
is a big maintenance burden because it's a moving target, which means things may break from nightly-release-to-release.
And if we add a description to the flag the we give zero stability guarantees? The rest of the project (if the feature isn't enabled) should of course stay compatible with stable
.
Of course if you say it's too much of a hastle, I might release it as a separate package.
I have nightly
features on other projects and it can be quite annoying. Even if it's pinned there can be sporadic breakages (e.g. ICE)
This crate is low maintenance enough, however, that I guess I'd be OK with it, so long as nightly were pinned and we don't get frequent requests to bump the nightly version and cut releases due to breakages like I experience with other projects.
I started with doing some benchmarks, and it seems like only abs
and sqrt
are available on thumbv7em-none-eabihf
. And sadly, it makes barely a difference.
(on Teensy 4.0)
===== Micromath Benchmark =====
Git Version: 9a7492b-modified
All values in ns/iter.
micromath libm intrinsics
abs 16.6 16.6 16.7
acos 178.0 120.3
asin 110.1 139.6
atan 85.0 78.5
atan_norm 76.8
ceil 48.4 58.1
cos 105.0 1668.0
exp 233.5 116.8
floor 38.5 61.4
fract 21.6
inv 16.6
invsqrt 16.6
ln 161.8 157.4
log2 166.9 177.6
log10 166.8 174.2
round 36.6
sin 110.0 1707.4
sqrt 53.4 491.8 38.4
tan 160.1 2560.6
trunc 26.3 45.0
There would still be the advantages of being compile time optimizable and being more accurate. But it seems micromath
is really optimized already. Kudos.
@tarcieri Should I create a PR with those benchmarks? It only includes the 1-parameter functions, so far; and I probably won't add more since I will no longer use intrinsics for myself now that I know how little the gain is.
Either way, here they are in case you are curious: https://github.com/Finomnis/micromath/tree/teensy40_benchmark/examples/benchmark_teensy40
Yes sure, that'd be great
@tarcieri #98
@tarcieri Btw, it seems that some of the libm
implementations are faster than ours, at least on the Teensy 4.0. Might be worth investigating. The biggest difference seems to be acos
. Although I'm only running it repeatedly on one single value, I'm not sure how input dependent the time is.
Will close this for now as I won't continue working on it.
it seems that some of the libm implementations are faster than ours
Yeah, that's an interesting data point, although this library optimizes for code size over performance.
I can investigate if there are any implementations which are both faster and relatively similar in terms of code size and potentially adopt those.
Many targets, like
thumbv7em-none-eabihf
have a hardware floating point unit which supports operations likef32::abs()
orf32::sqrt()
. Using those would improve performance and reduce code footprint.Sadly, in
no-std
, those are currently only obtainable through the unstablecore::intrinsics
API. A discussion about adding those tostable
exists, bit is stale for quite a while now. Still, for maximum performance, it would be worth it to add anunstable
feature flag (orexperimental_intrinsics
or similar) which compiles directly to intrinsics.This would work the following way. LLVM intrinsics can be compiled in three different ways by LLVM, depending on the circumstances. For example
sqrtf32()
:thumbv7em-none-eabihf
)sqrtf32
as a fallback if no hardware support exists. This function has to be provided by us, viaextern "C"
and#[no_mangle]
.Advantages:
Disadvantages:
If this sparks interest from the crate developer, I might start an implementation and open a PR. If not, I'm fine with having this issue closed.