Any `Plugin` cannot process stereo buffers containing only 2 samples

Problem

I'm developing a DSP graph using pedalboard. It basically consist of reading arbitrary audio files at configurable chunk size, processing them with a Chain and then writing back the output to file system. While writing unit tests I hit this puzzling behaviour that prevents me from moving forward.

If any Plugin.process(...) implementation receives a buffer that has shape (2, 2), it will fail with

RuntimeError: Unable to determine channel layout from shape!

This is not really an uncommon scenario, and can even be replicated with the first example provided in the Pedalboard website, by having a stereo some-file.wav at sample rate x that is exactly x+2 samples long. In general, if any chosen buffer reading size happens to cause a reminder of 2 samples over the audio file being read, the square matrix condition is met and no processing can be done in pedalboard°.

Stack trace

Inspecting the pedalboard source code, I recreated the C++ call stack, which is not shown in Python (most recent call last):

py::array_t<float> process(...) defined in process.h

https://github.com/spotify/pedalboard/blob/273eb538193b6e3557c92745a6dbbf6f1d627481/pedalboard/process.h#L260-L261

py::array_t<float> processFloat32(...) defined in process.h

https://github.com/spotify/pedalboard/blob/273eb538193b6e3557c92745a6dbbf6f1d627481/pedalboard/process.h#L166

template <typename T> ChannelLayout detectChannelLayout(...) defined in BufferUtils.h

https://github.com/spotify/pedalboard/blob/273eb538193b6e3557c92745a6dbbf6f1d627481/pedalboard/BufferUtils.h#L44-L45

Specifically, the last line is hit whenever Plugin.process(...) is called with a buffer that has shape (n, n) (i.e. a square matrix).

Proposed solution

Is it possible to change the function signature of Plugin.process(...) so that it can accept an explicit ChannelLayout? The semantics would be:

if None (default when not provided), then autodetect (existing behaviour, backwards compatible)
if provided, then skip autodetection (thus, accepts 2x2 buffers)

Since ReadableAudioFile.read() reliably returns non-interleaved buffers, then to me it makes sense to enforce the same layout when processing, leaving auto-detection for other scenarios / backwards compatibility?

I can help implementing this if needed 👍

° the workaround would be to check the buffer and if squared then append an extra padding sample to make it rectangular, but this seems less than ideal and unnecessarily complicates client code.

Adding to this:

import numpy as np
from pedalboard import AudioFile

with AudioFile("test.wav", "w", 48000, 1) as f:
    f.write(np.random.rand(1, 1))

Also fails for the same ChannelLayout detection. Of course this is an unrealistic minimal example, one has to consider instead a process loop where a chunk gets written to file: in that scenario, this bug can happen if the processing/stream size combination causes an iteration with a chunk of only one sample.

While in the 2x2 scenario the workaround would be to write one sample at a time (i.e. two buffers of shape (2, 1)), in the 1x1 no workaround is possible (writing an additional sample may not be acceptable if client code requires invariance of number of samples between pre-process and post-process).

Thanks for the super-detailed and well written bug report @mttbernardini!

You're totally right: the auto-detection of channel layout is a bit of a pain. I see a couple fixes that could be made, like you suggest:

For the (1, n) or (n, 1) cases, auto-detection of layout could be more informed by the channel count of the AudioFile object being used. This is a bug.
For the square input case (i.e.: (x, x)) things are a little more tricky. While we could add an explicit ChannelLayout parameter as you suggest, I pretty strongly want to avoid adding additional parameters if possible.

I totally forgot that I'd done this, but there's an example of how we can handle this in the Pedalboard codebase already: StreamResampler caches the last-detected channel layout and uses the last-provided layout if layout detection fails for a provided input. I think this approach would handle most real-world use cases, although it would still not solve the case in which the first buffer is square in shape.

Thank you for your reply @psobot.

I like the solution you proposed on point (2) and I believe having a scenario in which a Plugin or WritableAudioFile ever processes/writes just 1 or 2 samples is quite unlikely (it's more likely to process/writes 1 or 2 samples as a remainder after several big chunks). To be sure, we may want to specify this behaviour in the documentation and possibly make the raised RuntimeError more detailed (or with a link to the docs if the description is too long).

I'm not sure I follow you for point (1) though. The behaviour of WritableAudioFile.write() seems to match the behaviour of Plugin.process(), i.e. raises a RuntimeError when the input buffer is shaped (n, n). How would the channel count of the WritableAudioFile help in detecting the channel layout? A stereo WritableAudioFile that receives a (2, 2) buffer would still fail. Rather, I believe your solution in point (2) can be adopted for WritableAudioFile as well.

Let me know your thoughts and if you'd like me to implement this.

spotify / pedalboard