Closed khodzha closed 4 years ago
Looking at the benchmark results, the situation is gotten worse :(
Could you write benchmarks for the single function so we can look what happens on it? (also cargo-asm might shed some light)
i added a simpler benchmark which doesnt change rate of output
i also took a look at C implementation of speexdsp and
resampler_basic_direct_single
results with simpler bench:
resampler_simple_c time: [2.0033 ms 2.0225 ms 2.0387 ms]
resampler_simple_rust time: [2.7497 ms 2.7794 ms 2.7974 ms] (without unroll and hsum with hadd)
resampler_simple_rust time: [2.1840 ms 2.2375 ms 2.3400 ms] (with unroll and hsum with movehl/shuffle)
It looks really nice :) there is still some overhead that should go away but it is a fairly good improvement :)
Great @khodzha! Thanks a lot! :)
results for doubles right now:
resampler_simple_rust_dbl
[7.7673 ms 7.9603 ms 8.1017 ms]
resampler_simple_c_dbl
[8.0655 ms 8.3503 ms 8.6117 ms]
i rebased and squashed commits and marked PR as ready for review if you want to merge it
It seems to still have conflicts. great result :)
not really, bench gives
[10.452 ms 10.509 ms 10.618 ms]
:man_shrugging: