vvolhejn / thesis

ETH Zürich MSc Thesis: Accelerating Neural Audio Synthesis
Apache License 2.0
17 stars 1 forks source link

Benchmark/profile baselines to identify performance bottlenecks #6

Closed vvolhejn closed 2 years ago

vvolhejn commented 2 years ago

This is the most important figure:

Screenshot 2022-04-11 at 14 46 33

DDSP

On the left, there is the DDSP autoencoder model except these two performance improvements:

50% of DDSP's runtime is now non-trainable components (processor_group).

RAVE

On the right, there is a RAVE-like model: dilated CNN-based encoder and decoder sandwiched in a 16-band PQMF decomposition. The noise generator is not included here as it lead to the network only using the noise generator to mimic the spectrum instead of the waveform generator. (This still needs inspecting.)

The only major non-trainable component is the PQMF analysis ("preprocessor" in the plot) and synthesis. These can probably be sped up further as well.

The conclusion is that we should work on speeding up RAVE if simply because the focus is on accelerating neural network synthesis and there is more to be gained here by accelerating the ML parts of the model.