Closed ryukau closed 2 years ago
Some observations:
Using sample rate 44100 and small buffer size 128, CPU spikes overload Apple M1 CPU core, but the same configuration performs better on Win64 i7-3820 (CPU spikes don't exceed 70%).
@nekotaro6 I managed to reduce CPU spikes. Could you test the following build?
Some details: There's a trade off that lowering spikes increases average load. CPU load at spikes on v0.1.6 (the one on above link) is roughly 10% compared to v0.1.5 (current master). The interval of spikes is 128 samples on v0.1.6.
Following is a fancy CPU load graph from micro benchmark.
@ryukau I have tested new build, both M1 Mac and Windows tests shows small increase in average load and CPU spikes are a little above average load. With buffer 128:
Thanks for testing. I'm now confident that they work without issue. If something is not working as expected, feel free to add comment or open new issue.
The changes are pushed, and it will be available in minutes as new release.
The spikes are part of intended behavior, but can be improved.
The convolution algorithm used in MiniCliffEQ is based on "minimal computation cost solution" in following paper:
It says that: "Although this is a minimum cost solution in terms of computation per output sample, it is completely impractical. This is because all the computation must be done during short periods of time." on second paragraph on page 3.
The paper is written in 1993, and minimum cost solution is not completely impractical on recent CPU. The observed fact is that it works when buffer size is sufficiently large (link). And that is sufficient for intended use case, which is to suppress the direct current right before rendering.
The paper continues to discuss the algorithm with no CPU spikes, which is to distribute FFT computation across samples. So this issue can be solved by implementing it. The real problem is how much time it takes to implement the feature.
Relevant code is
MiniCliffEQ/source/dsp/fftconvolver.hpp
.I'll revisit this issue later, but it's not top priority because there's a workaround.
The FIR filter used in MiniCliffEQ introduces 2^14 sample latency. So following steps can be used as a workaround:
Edited on 2022-06-24 to add clarification.