sdatkinson / NeuralAmpModelerCore

Core DSP library for NAM plugins
MIT License
270 stars 54 forks source link

Do not allocate memory during WaveNet processing #49

Open mikeoliphant opened 1 year ago

mikeoliphant commented 1 year ago

Currently, the WaveNet model processing code re-sizes vectors and matrices based on the audio buffer size during processing. This is non-ideal for real-time operation. Instead, all sizing operations should be done out-of-band of the processing loop.

In most cases, the current behavior should not cause significant problems. If there is a fixed audio buffer size the resize operations should only happen once. A fixed buffer size is not guaranteed, however - DAWs will sometimes vary the block size.

daleonov commented 1 year ago

Agreed. There should be some kind of prepareBuffers() method. There's always some method that called by the DAW every time maximum block size or sample rate changes (prepareToPlay() in JUCE, and there was something similar in iPlug), so at least happens once when the session starts, and it happens outside audio thread. It's the only safe place to resize the buffers. Also, like Mike mentioned, block size can vary. Good example is when you get looped selection in the DAW, the last block in that loop is always smaller than usual, and then the next one is back to normal size. The trick is there's usually no prepareToPlay() callback from the DAW in that case, so you have to size down, and then size back up with no memory allocation.

mikeoliphant commented 1 year ago

Yes - typically this would be handled by allocating for the max size, but then only processing the number of samples you are given.

Eigen has a way to specify the maximum matrix size, but unfortunately it is at compile time:

https://eigen.tuxfamily.org/dox/classEigen_1_1Matrix.html

It should be possible to create matrices/vectors at max size, and then just do block operations on them at the current given size.

sdatkinson commented 1 year ago

Sounds reasonable--I had started to move some things out in the iPlug2 plugin, but yeah, this reeks of me getting back into C++ for this project 😅

Good example is when you get looped selection in the DAW, the last block in that loop is always smaller than usual, and then the next one is back to normal size.

Did not know--great call!

Let me know if either of you want to take this on--happy to assign it 👍🏻

mikeoliphant commented 1 year ago

yeah, this reeks of me getting back into C++

I feel your pain...

Let me know if either of you want to take this on--happy to assign it

I can probably look into sorting this out.

olilarkin commented 8 months ago

I just ran NAM audiounit with Apple's auval real time safety checker and it pointed to a couple of things...


Realtime-safety violation:
                libsystem_malloc.dylib`malloc
                NeuralAmpModeler`Eigen::internal::aligned_malloc(unsigned long)+0x38
                NeuralAmpModeler`void* Eigen::internal::conditional_aligned_malloc<true>(unsigned long)+0x18
                NeuralAmpModeler`float* Eigen::internal::conditional_aligned_new_auto<float, true>(unsigned long)+0x64
                NeuralAmpModeler`Eigen::DenseStorage<float, -1, -1, -1, 0>::resize(long, long, long)+0x78
                NeuralAmpModeler`Eigen::PlainObjectBase<Eigen::Matrix<float, -1, -1, 0, -1, -1>>::resize(long, long)+0x1f0
                NeuralAmpModeler`Eigen::internal::Assignment<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0>, Eigen::internal::assign_op<float, float>, Eigen::internal::Dense2Dense, void>::run(Eigen::Matrix<float, -1, -1, 0, -1, -1>&, Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0> const&, Eigen::internal::assign_op<float, float> const&)+0x78
                NeuralAmpModeler`void Eigen::internal::call_assignment_no_alias<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0>, Eigen::internal::assign_op<float, float>>(Eigen::Matrix<float, -1, -1, 0, -1, -1>&, Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0> const&, Eigen::internal::assign_op<float, float> const&)+0x30
                NeuralAmpModeler`Eigen::Matrix<float, -1, -1, 0, -1, -1>& Eigen::PlainObjectBase<Eigen::Matrix<float, -1, -1, 0, -1, -1>>::_set_noalias<Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0>>(Eigen::DenseBase<Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0>> const&)+0x3c
                NeuralAmpModeler`void Eigen::PlainObjectBase<Eigen::Matrix<float, -1, -1, 0, -1, -1>>::_init1<Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0>, Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0>>(Eigen::DenseBase<Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0>> const&)+0x20
                NeuralAmpModeler`Eigen::Matrix<float, -1, -1, 0, -1, -1>::Matrix<Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0>>(Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0> const&)+0x2c
                NeuralAmpModeler`Eigen::Matrix<float, -1, -1, 0, -1, -1>::Matrix<Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0>>(Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0> const&)+0x24
                NeuralAmpModeler`void Eigen::internal::call_assignment<Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1>, -1, -1, true>, Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0>, Eigen::internal::add_assign_op<float, float>>(Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1>, -1, -1, true>&, Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0> const&, Eigen::internal::add_assign_op<float, float> const&, std::__1::enable_if<evaluator_assume_aliasing<Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0>>::value, void*>::type)+0x2c
                NeuralAmpModeler`Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1>, -1, -1, true>& Eigen::MatrixBase<Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1>, -1, -1, true>>::operator+=<Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0>>(Eigen::MatrixBase<Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0>> const&)+0x40
                NeuralAmpModeler`nam::Conv1D::process_(Eigen::Matrix<float, -1, -1, 0, -1, -1> const&, Eigen::Matrix<float, -1, -1, 0, -1, -1>&, long, long, long) const+0x174
                NeuralAmpModeler`nam::wavenet::_Layer::process_(Eigen::Matrix<float, -1, -1, 0, -1, -1> const&, Eigen::Matrix<float, -1, -1, 0, -1, -1> const&, Eigen::Matrix<float, -1, -1, 0, -1, -1>&, Eigen::Matrix<float, -1, -1, 0, -1, -1>&, long, long)+0x70
                NeuralAmpModeler`nam::wavenet::_LayerArray::process_(Eigen::Matrix<float, -1, -1, 0, -1, -1> const&, Eigen::Matrix<float, -1, -1, 0, -1, -1> const&, Eigen::Matrix<float, -1, -1, 0, -1, -1>&, Eigen::Matrix<float, -1, -1, 0, -1, -1>&, Eigen::Matrix<float, -1, -1, 0, -1, -1>&)+0x1bc
                NeuralAmpModeler`nam::wavenet::WaveNet::process(float*, float*, int)+0x150
                NeuralAmpModeler`NeuralAmpModeler::ProcessBlock(float**, float**, int)::$_10::operator()(float**, float**, int) const+0x4c
                NeuralAmpModeler`decltype(std::declval<NeuralAmpModeler::ProcessBlock(float**, float**, int)::$_10&>()(std::declval<float**>(), std::declval<float**>(), std::declval<int>())) std::__1::__invoke[abi:v160006]<NeuralAmpModeler::ProcessBlock(float**, float**, int)::$_10&, float**, float**, int>(NeuralAmpModeler::ProcessBlock(float**, float**, int)::$_10&, float**&&, float**&&, int&&)+0x3c

  Realtime-safety violation:
                libsystem_malloc.dylib`free
                NeuralAmpModeler`Eigen::internal::aligned_free(void*)+0x34
                NeuralAmpModeler`void Eigen::internal::aligned_delete<float>(float*, unsigned long)+0x28
                NeuralAmpModeler`Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false>::~gemm_blocking_space()+0x24
                NeuralAmpModeler`Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false>::~gemm_blocking_space()+0x1c
                NeuralAmpModeler`void Eigen::internal::generic_product_impl<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, Eigen::DenseShape, Eigen::DenseShape, 8>::scaleAndAddTo<Eigen::Matrix<float, -1, -1, 0, -1, -1>>(Eigen::Matrix<float, -1, -1, 0, -1, -1>&, Eigen::Matrix<float, -1, -1, 0, -1, -1> const&, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true> const&, float const&)+0x360
                NeuralAmpModeler`void Eigen::internal::generic_product_impl<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, Eigen::DenseShape, Eigen::DenseShape, 8>::evalTo<Eigen::Matrix<float, -1, -1, 0, -1, -1>>(Eigen::Matrix<float, -1, -1, 0, -1, -1>&, Eigen::Matrix<float, -1, -1, 0, -1, -1> const&, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true> const&)+0xb0
                NeuralAmpModeler`Eigen::internal::Assignment<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0>, Eigen::internal::assign_op<float, float>, Eigen::internal::Dense2Dense, void>::run(Eigen::Matrix<float, -1, -1, 0, -1, -1>&, Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0> const&, Eigen::internal::assign_op<float, float> const&)+0xa8
                NeuralAmpModeler`void Eigen::internal::call_assignment_no_alias<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0>, Eigen::internal::assign_op<float, float>>(Eigen::Matrix<float, -1, -1, 0, -1, -1>&, Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0> const&, Eigen::internal::assign_op<float, float> const&)+0x30
                NeuralAmpModeler`Eigen::Matrix<float, -1, -1, 0, -1, -1>& Eigen::PlainObjectBase<Eigen::Matrix<float, -1, -1, 0, -1, -1>>::_set_noalias<Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0>>(Eigen::DenseBase<Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0>> const&)+0x3c
                NeuralAmpModeler`void Eigen::PlainObjectBase<Eigen::Matrix<float, -1, -1, 0, -1, -1>>::_init1<Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0>, Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0>>(Eigen::DenseBase<Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0>> const&)+0x20
                NeuralAmpModeler`Eigen::Matrix<float, -1, -1, 0, -1, -1>::Matrix<Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0>>(Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0> const&)+0x2c
                NeuralAmpModeler`Eigen::Matrix<float, -1, -1, 0, -1, -1>::Matrix<Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0>>(Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0> const&)+0x24
                NeuralAmpModeler`void Eigen::internal::call_assignment<Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1>, -1, -1, true>, Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0>, Eigen::internal::add_assign_op<float, float>>(Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1>, -1, -1, true>&, Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0> const&, Eigen::internal::add_assign_op<float, float> const&, std::__1::enable_if<evaluator_assume_aliasing<Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0>>::value, void*>::type)+0x2c
                NeuralAmpModeler`Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1>, -1, -1, true>& Eigen::MatrixBase<Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1>, -1, -1, true>>::operator+=<Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0>>(Eigen::MatrixBase<Eigen::Product<Eigen::Matrix<float, -1, -1, 0, -1, -1>, Eigen::Block<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, -1, -1, true>, 0>> const&)+0x40
                NeuralAmpModeler`nam::Conv1D::process_(Eigen::Matrix<float, -1, -1, 0, -1, -1> const&, Eigen::Matrix<float, -1, -1, 0, -1, -1>&, long, long, long) const+0x174
                NeuralAmpModeler`nam::wavenet::_Layer::process_(Eigen::Matrix<float, -1, -1, 0, -1, -1> const&, Eigen::Matrix<float, -1, -1, 0, -1, -1> const&, Eigen::Matrix<float, -1, -1, 0, -1, -1>&, Eigen::Matrix<float, -1, -1, 0, -1, -1>&, long, long)+0x70
                NeuralAmpModeler`nam::wavenet::_LayerArray::process_(Eigen::Matrix<float, -1, -1, 0, -1, -1> const&, Eigen::Matrix<float, -1, -1, 0, -1, -1> const&, Eigen::Matrix<float, -1, -1, 0, -1, -1>&, Eigen::Matrix<float, -1, -1, 0, -1, -1>&, Eigen::Matrix<float, -1, -1, 0, -1, -1>&)+0x1bc
                NeuralAmpModeler`nam::wavenet::WaveNet::process(float*, float*, int)+0x150
                NeuralAmpModeler`NeuralAmpModeler::ProcessBlock(float**, float**, int)::$_10::operator()(float**, float**, int) const+0x4c

full transcript (from my NAM version...)

validata.txt

olilarkin commented 8 months ago

if you set the EIGEN_RUNTIME_NO_MALLOC preprocessor macro and then ...

Eigen::internal::set_is_malloc_allowed(false);
dsp->process()
dsp->finalize_(nFrames);
Eigen::internal::set_is_malloc_allowed(true);

It shows that every single call to process is calling malloc/free. This is bad and fixing it might save quite a few CPU cycles, let alone preventing some potential glitches

olilarkin commented 4 months ago

Maybe a clue here:

https://github.com/stulp/eigenrealtime

rerdavies commented 3 weeks ago

I just pushed a pull-request that cleans up realtime memory allocations due to use of Eigen temporary Matrices. With the changes applied, NeuralAmpModelerCore no longer does memory allocations on any process call except the first. Net results: a 20% performance improvement (enormously valuable when running on Pi 4s'), a substantial reduction in CPU use jitter, and probably progressively worse performance as NAM, and other plugins that are unwisely doing memory allocations fragment the realtime thread's heap.

Hosts can ensure that buffers for Eigen MatrixXfs are pre-allocated by processing one sample off the realtime thread before allowing the model to run on the realtime thread. Currently, MatrixXf memory is allocated during the first processing cycle.

@sdatkinson

It might be useful to introduce an Activate() method on DSPs, the implementation of which would just run the model for one cycle. Or even do it as part of get_dsp().