Open johannesvollmer opened 1 year ago
We should also back that up with benchmarks. I doubt LTO is going to be much of a gain. -C codegen-units=1
is also worth a shot. And if -Ctarget-cpu=native
does indeed help, we should just identify the functions that get a speedup and multiversion them using the multiversion crate - that way the speedups will be accessible to everyone without the need to use -Ctarget-cpu=native
.
great ideas :)
The latest version of half
crate now uses the f16 conversion intrinsics on stable Rust, so reading to f16 will be a lot faster on half
v2.3.1 and later.
awesome, does that mean we can finally continue that pull request from some time ago? :) #191
What can be improved or is missing?
the documentation should provide the basic steps necessary to obtain the best performance. for example, setting the target cpu to native in .cargo/config.toml. But also LTO. feature flags in dependencies, like
half: use-intrinsics
. and which exr variants are faster and whatnot.