spotify / pedalboard

🎛 🔊 A Python library for audio.
GNU General Public License v3.0
4.96k stars 249 forks source link

Can I load a PyDub Audio Segment and pass it through pedalboard? #322

Closed shuZro closed 1 month ago

shuZro commented 1 month ago

Can I load a PyDub Audio Segment and pass it through pedalboard?

psobot commented 1 month ago

Hi @shuZro!

Yes, PyDub segments can be read in as samples with the get_array_of_samples function, which can then be converted into a floating-point audio array:

import numpy as np
import pydub

seg = pydub.AudioSegment.from_ogg("foobar.ogg")
array = seg.get_array_of_samples()

# Convert to NumPy
np_array = np.array(array)

# Convert to floating-point:
float_array = np_array / max(abs(np.iinfo(np_array.dtype).min), abs(np.iinfo(np_array.dtype).max))

# Convert from interlaced data to (num_channels, num_samples)
audio = float_array.reshape([-1, seg.channels]).T
samplerate = seg.frame_rate

# Now just use audio and samplerate to interact with Pedalboard APIs!

...but I would not recommend doing this. PyDub is a convenient framework, but requires loading entire AudioSegment objects into memory, which is both slow and wasteful. If you have an audio file on disk or in memory, use to treat the file just like a regular Python open file object instead:

from import AudioFile

with AudioFile("foobar.ogg") as f:
    audio = * 10) # read 10 seconds * 60 * 2) # seek to the 2-minute mark
    audio = * 10) # read from 2:00 to 2:10
shuZro commented 1 month ago

@psobot Thanks! One other question. I wanted to convert the output from pedalboard to an Audio Segment. But when doing so it gets all distorted. Any ideas? Here is a snippet:

audio = effect_board(audio, samplerate)

    return AudioSegment(

Also my original audio was an int16 bit audio. So if the output could be in that format. Tried this too but the audio is silent.

a = np.array(audio, dtype=np.int16)
    new = AudioSegment(
psobot commented 1 month ago

You can convert a 32-bit floating-point audio buffer (what Pedalboard uses) to a 16-bit signed interleaved integer representation by doing the opposite of what's done in the code above:

audio: np.NDArray[np.float32] = ...

target_dtype = np.int16

# Convert to fixed-point by scaling to the maximum value of an int and then converting to int:
int_array = (audio * min(abs(np.iinfo(target_dtype).min), abs(np.iinfo(target_dtype).max))).astype(target_dtype)

# Switch from split-channel (num_channels, num_samples) to interleaved (num_samples, num_channels):
interleaved_int_array = int_array.T

# ...and pack into an AudioSegment:
seg = AudioSegment(
    sample_width=np.iinfo(target_dtype).bits // 8,