spotify / pedalboard

🎛 🔊 A Python library for audio.
https://spotify.github.io/pedalboard
GNU General Public License v3.0
4.96k stars 249 forks source link

Any `Plugin` cannot process stereo buffers containing only 2 samples #336

Open mttbernardini opened 3 weeks ago

mttbernardini commented 3 weeks ago

Problem

I'm developing a DSP graph using pedalboard. It basically consist of reading arbitrary audio files at configurable chunk size, processing them with a Chain and then writing back the output to file system. While writing unit tests I hit this puzzling behaviour that prevents me from moving forward.

If any Plugin.process(...) implementation receives a buffer that has shape (2, 2), it will fail with

RuntimeError: Unable to determine channel layout from shape!

This is not really an uncommon scenario, and can even be replicated with the first example provided in the Pedalboard website, by having a stereo some-file.wav at sample rate x that is exactly x+2 samples long. In general, if any chosen buffer reading size happens to cause a reminder of 2 samples over the audio file being read, the square matrix condition is met and no processing can be done in pedalboard°.

Stack trace

Inspecting the pedalboard source code, I recreated the C++ call stack, which is not shown in Python (most recent call last):

https://github.com/spotify/pedalboard/blob/273eb538193b6e3557c92745a6dbbf6f1d627481/pedalboard/process.h#L260-L261

https://github.com/spotify/pedalboard/blob/273eb538193b6e3557c92745a6dbbf6f1d627481/pedalboard/process.h#L166

https://github.com/spotify/pedalboard/blob/273eb538193b6e3557c92745a6dbbf6f1d627481/pedalboard/BufferUtils.h#L44-L45

Specifically, the last line is hit whenever Plugin.process(...) is called with a buffer that has shape (n, n) (i.e. a square matrix).

Proposed solution

Is it possible to change the function signature of Plugin.process(...) so that it can accept an explicit ChannelLayout? The semantics would be:

Since ReadableAudioFile.read() reliably returns non-interleaved buffers, then to me it makes sense to enforce the same layout when processing, leaving auto-detection for other scenarios / backwards compatibility?

I can help implementing this if needed 👍

° the workaround would be to check the buffer and if squared then append an extra padding sample to make it rectangular, but this seems less than ideal and unnecessarily complicates client code.

mttbernardini commented 3 weeks ago

Adding to this:

import numpy as np
from pedalboard import AudioFile

with AudioFile("test.wav", "w", 48000, 1) as f:
    f.write(np.random.rand(1, 1))

Also fails for the same ChannelLayout detection. Of course this is an unrealistic minimal example, one has to consider instead a process loop where a chunk gets written to file: in that scenario, this bug can happen if the processing/stream size combination causes an iteration with a chunk of only one sample.

While in the 2x2 scenario the workaround would be to write one sample at a time (i.e. two buffers of shape (2, 1)), in the 1x1 no workaround is possible (writing an additional sample may not be acceptable if client code requires invariance of number of samples between pre-process and post-process).

psobot commented 3 weeks ago

Thanks for the super-detailed and well written bug report @mttbernardini!

You're totally right: the auto-detection of channel layout is a bit of a pain. I see a couple fixes that could be made, like you suggest:

mttbernardini commented 2 weeks ago

Thank you for your reply @psobot.

I like the solution you proposed on point (2) and I believe having a scenario in which a Plugin or WritableAudioFile ever processes/writes just 1 or 2 samples is quite unlikely (it's more likely to process/writes 1 or 2 samples as a remainder after several big chunks). To be sure, we may want to specify this behaviour in the documentation and possibly make the raised RuntimeError more detailed (or with a link to the docs if the description is too long).

I'm not sure I follow you for point (1) though. The behaviour of WritableAudioFile.write() seems to match the behaviour of Plugin.process(), i.e. raises a RuntimeError when the input buffer is shaped (n, n). How would the channel count of the WritableAudioFile help in detecting the channel layout? A stereo WritableAudioFile that receives a (2, 2) buffer would still fail. Rather, I believe your solution in point (2) can be adopted for WritableAudioFile as well.

Let me know your thoughts and if you'd like me to implement this.