Is your feature request related to a problem? Please describe.
We currently have two interfaces to ingesting audio: load and stream. Aside from the obvious functional differences between the two, a key difference is that the stream interface does not support resample-on-load, while load has this enabled by default. This is primarily because the resampling libraries we support (scipy, resampy, libsamplerate, soxr) do not all support stream-based processing, though the latter two do.
It would be great if we could smooth this gap over, though it does introduce some technical hurdles.
Describe the solution you'd like
For a restricted class of res_type values (namely, samplerate and soxr resamplers), we could in principle support resample-on-load within stream. We could think of this as having a second layer of processing in between the soundfile chunk generator and our block generator which handles the conversion.
From an interface perspective, this ought to be relatively straightforward, with the caveat that block and frame parameters should be understood to operate at the target sample rate, not the native rate.
This does raise a question of whether we could guarantee numerical equivalence (assuming the same resamplers are used) between load and stream. Getting this right might require some modification of the chunking parameters in soundfile to ensure that the resampler has enough future context to perform the interpolation. That in turn will depend on the specifics of each resampler.
Describe alternatives you've considered
:shrug:
Additional context
If at all possible, I'd like to strive for API consistency here. This will require some hard decisions:
If load resamples by default, should stream also?
I'd argue yes, even though we don't yet. This will break API.
If stream resamples, should it use the same default implementation as load?
Again, I'd argue yes. However, the default resampler (resampy) does not support streaming (and won't) - so we'll have to change the default. This will break API.
If we change the default resampler, what are the implications for package dependencies?
At present, resampy is the default because it's simple, reliably packaged, permissively licensed, and cross-platform. Back in 2016, it was the only library that checked all these boxes, but it's a different world now.
Off the cuff, my preference would be to move to soxr as the default. The core library is very mature, though the python wrapper is young and has a small bus factor.
Samplerate would also be fine, though in my experience, soxr is both more accurate and faster.
Would we relegate resampy to an optional dependency, and promote (soxr|samplerate) to a strict requirement? (I think yes - I don't want to increase the number of strict requirements, and would prefer to reduce them when possible.)
Is your feature request related to a problem? Please describe.
We currently have two interfaces to ingesting audio:
load
andstream
. Aside from the obvious functional differences between the two, a key difference is that thestream
interface does not support resample-on-load, whileload
has this enabled by default. This is primarily because the resampling libraries we support (scipy, resampy, libsamplerate, soxr) do not all support stream-based processing, though the latter two do.It would be great if we could smooth this gap over, though it does introduce some technical hurdles.
Describe the solution you'd like
For a restricted class of
res_type
values (namely, samplerate and soxr resamplers), we could in principle support resample-on-load withinstream
. We could think of this as having a second layer of processing in between the soundfile chunk generator and our block generator which handles the conversion.From an interface perspective, this ought to be relatively straightforward, with the caveat that block and frame parameters should be understood to operate at the target sample rate, not the native rate.
This does raise a question of whether we could guarantee numerical equivalence (assuming the same resamplers are used) between load and stream. Getting this right might require some modification of the chunking parameters in soundfile to ensure that the resampler has enough future context to perform the interpolation. That in turn will depend on the specifics of each resampler.
Describe alternatives you've considered :shrug:
Additional context
If at all possible, I'd like to strive for API consistency here. This will require some hard decisions: