Closed rerdavies closed 1 year ago
The double conversion is only done because the core NAM code currently only accepts double input (even though it run float natively). This is because the NAM VST plugin runs using double. I hope to get it changed to take float (or either) in the future.
It isn't a hot-path performance issue though. The neural net model processing is very expensive, so the down/up converting a couple of times per buffer pass, while silly, isn't really a factor.
Performance of NN models is very dependent on their architecture. I haven't looked at your ToobML code, but I suspect it is using a much simpler architecture than the NAM WaveNet models.
Btw, I'm running on Raspberry Pi as well, and see similar performance. NAM "standard" models run, but just barely. The lighter-weight NAM architectures have more headroom.
Speaking of Raspberry Pi, we obviously think along similar lines.
This is what I run on my Pi:
https://blog.nostatic.org/2020/11/guitar-amplifier-and-pedal-simulation.html
And this is what my pedalboard looks like:
https://blog.nostatic.org/2023/03/neural-amp-modeler-nam-running-on.html
Performance is about half of the ToobAmp's (https://github.com/rerdavies/ToobAmp) Toob ML Amplifier on Raspberry Pi 4.
It's not entirely a fair comparison, as the ToobML amplier supports only a fixed set of models, in a single architecture, and the audio-thread code is custom code that has been tighgly optimized for ARM NEON.
I suspect the major difference is attributable to the fact that you are using doubles, vs. the TooB ML Amplifier, which uses float. (Yours is significantly better, by virtue of supporting uploadable .nam files).
Is there any way to easily change the NAM models to run with float instead of double? I'm guessing that using float instead of double won't affect quality greatly; but the ability to use float vectorization should greatly improve performance.
For reference, neural-amp-moderl-lv2 uses about 55% of available CPU on a raspberry PI 4. Terrifying, but surprisingly it runs stably without underrunds. TooB ML Amplifier, on the other hand, uses about 17% of available CPU. ARM processors have x4 SIMD for floats, and GCC compilers optimize very well for ARM NEON. On the other hand, ARM NEON only has x2 SIMD for doubles, which, for most practical purposes turns out to provide little value.