Closed shuZro closed 1 month ago
Hi @shuZro!
Yes, PyDub segments can be read in as samples with the get_array_of_samples
function, which can then be converted into a floating-point audio array:
import numpy as np
import pydub
seg = pydub.AudioSegment.from_ogg("foobar.ogg")
array = seg.get_array_of_samples()
# Convert to NumPy
np_array = np.array(array)
# Convert to floating-point:
float_array = np_array / max(abs(np.iinfo(np_array.dtype).min), abs(np.iinfo(np_array.dtype).max))
# Convert from interlaced data to (num_channels, num_samples)
audio = float_array.reshape([-1, seg.channels]).T
samplerate = seg.frame_rate
# Now just use audio and samplerate to interact with Pedalboard APIs!
...but I would not recommend doing this. PyDub is a convenient framework, but requires loading entire AudioSegment
objects into memory, which is both slow and wasteful. If you have an audio file on disk or in memory, use pedalboard.io.AudioFile
to treat the file just like a regular Python open file object instead:
from pedalboard.io import AudioFile
with AudioFile("foobar.ogg") as f:
audio = f.read(f.samplerate * 10) # read 10 seconds
f.seek(f.samplerate * 60 * 2) # seek to the 2-minute mark
audio = f.read(f.samplerate * 10) # read from 2:00 to 2:10
@psobot Thanks! One other question. I wanted to convert the output from pedalboard to an Audio Segment. But when doing so it gets all distorted. Any ideas? Here is a snippet:
audio = effect_board(audio, samplerate)
return AudioSegment(
audio.tobytes(),
sample_width=4,
frame_rate=samplerate,
channels=2
)
Also my original audio was an int16 bit audio. So if the output could be in that format. Tried this too but the audio is silent.
a = np.array(audio, dtype=np.int16)
new = AudioSegment(
a.tobytes(),
sample_width=2,
frame_rate=samplerate,
channels=2
)
You can convert a 32-bit floating-point audio buffer (what Pedalboard uses) to a 16-bit signed interleaved integer representation by doing the opposite of what's done in the code above:
audio: np.NDArray[np.float32] = ...
target_dtype = np.int16
# Convert to fixed-point by scaling to the maximum value of an int and then converting to int:
int_array = (audio * min(abs(np.iinfo(target_dtype).min), abs(np.iinfo(target_dtype).max))).astype(target_dtype)
# Switch from split-channel (num_channels, num_samples) to interleaved (num_samples, num_channels):
interleaved_int_array = int_array.T
# ...and pack into an AudioSegment:
seg = AudioSegment(
interleaved_int_array.tobytes(),
sample_width=np.iinfo(target_dtype).bits // 8,
frame_rate=samplerate,
channels=interleaved_int_array.shape[0]
)
Can I load a PyDub Audio Segment and pass it through pedalboard?