librosa / librosa

Python library for audio and music analysis
https://librosa.org/
ISC License
7.09k stars 960 forks source link

VQT wavelet filters with more graceful octave boundaries #1676

Closed bmcfee closed 1 year ago

bmcfee commented 1 year ago

Is your feature request related to a problem? Please describe.

In https://github.com/librosa/librosa/issues/1073#issuecomment-968752759 , we sorted out how to handle bandwidth calculation for wavelet filters at the boundaries of the frequency range. This resulted in the following chunk of code:

https://github.com/librosa/librosa/blob/1f770e8fa778a548111204192757043d0313a010/librosa/filters.py#L797-L804

where we estimate the "local bins per octave" in the neighborhood of the center frequency for each filter. For "interior frequencies" (i.e. not fmin or fmax), we use one above and one below to do this; for the boundary frequencies, we just use one above (fmin) or one below (fmax).

This is fine if the frequencies span the entire range we care about, but we also often generate filters one octave at a time. In this case, we'll get slightly different results between the full-range and octave-range calculations when the frequency spacing is uneven.

Describe the solution you'd like

We could refine this a little by allowing the length calculator to assume octave equivalence above or below (or both) the frequency range. That is, instead of

bpo[0] = 1 / (logf[1] - logf[0]) 

for the BPO estimate at f[0], we could do

bpo[0] = 2 / (logf[1] - (logf[-1] - 1))  # divide fmax/2 to get one freq down from f[0]

and likewise for the top frequency:

bpo[0] = 1 / (logf[1] - logf[0]) 

becomes

bpo[-1] = 2 / (logf[0] + 1 - logf[-2]) 

These behaviors should be controlled by flags to enable/disable octave equivalence on each side. The result should ensure consistent behavior between full-range and octave-range frequency generation, except at the bottom of the lowest or top of the highest octaves.

Note: when using equal spacing (ie CQT or VQT with intervals='equal'), none of this should matter, and the behavior should be identical in all cases.

Describe alternatives you've considered :shrug:

Additional context Since JIT is experimental right now, I think we can tweak this behavior in a minor without jumping revisions.

bmcfee commented 1 year ago

Thinking on this a bit more, I have an alternative solution that might have the same effect without adding complexity to the filter constructor.

Working through the VQT function, we first construct the lengths (at native sampling rate) of the entire filter-bank to check Nyquist and determine if/how much early downsampling can be implemented: https://github.com/librosa/librosa/blob/1f770e8fa778a548111204192757043d0313a010/librosa/core/constantq.py#L947-L956

In doing so, we pass in an ET-derived α: https://github.com/librosa/librosa/blob/1f770e8fa778a548111204192757043d0313a010/librosa/core/constantq.py#L1481-L1494

This α is ignored and clobbered when there is more than one frequency: https://github.com/librosa/librosa/blob/1f770e8fa778a548111204192757043d0313a010/librosa/filters.py#L795-L808

Later on in the octave loop, we construct each octave's filterbanks separately: https://github.com/librosa/librosa/blob/1f770e8fa778a548111204192757043d0313a010/librosa/core/constantq.py#L985-L1005 and https://github.com/librosa/librosa/blob/1f770e8fa778a548111204192757043d0313a010/librosa/core/constantq.py#L1044-L1067 That same dummy alpha variable is propagated through, and only exists to catch the cases where we end up with a single frequency in some octave.

I'm thinking we can resolve this more elegantly by correctly calculating the alpha array up front (we're implicitly doing that anyway) and then providing slices for each octave just like we do with the frequencies. All we'd really have to do is not ignore the provided alpha in filters.wavelet_lengths, and factor out the calculation to its own helper function. We already have such a helper function in core.constantq.__bpo_to_alpha, it's just not as smart as it should be.

The result of this means that we don't need any special-case octave folding logic in the wavelet constructor. What do you think @lostanlen ?

bmcfee commented 1 year ago

To summarize above, here are the proposed changes:

The only user-facing API change here is that alpha may no longer be ignored by wavelet_lengths if it is provided. If a user wants this ignoring behavior, they can pass in alpha=None (which is the default anyway). And, of course, the filter bandwidths will be ever-so-slightly different around octave boundaries when using non-uniform spacings.

These changes both seem acceptable to me for a minor revision.