Open d-dawg78 opened 1 week ago
Note: Links to docs will display an error until the docs builds have been completed.
This comment was automatically generated by Dr. CI and updates every 15 minutes.
Awesome contribution! A bunch of torch.ones
tensors are initiated on CPU regardless of the input tensor's device. Also, it would be nice if there was an inverse VQT function as well. Also also, do you know a set of parameters that would result in a perfect or nearly perfect reconstruction? I had to fiddle with the filter lengths code to get something that was even close, but there's still an upper frequency buzzing sound and increased loudness at the start/end. I also noticed my 262,144 input to a CQT with hop_size 512 had an output size of 513 instead of 512 unless I set the hop_size to 513, but that may be a result of the aforementioned fiddling.
Hey, here's to addressing the feedback ☝️
Good catch on the torch.ones
front - the most recent commit should address this issue.
We are following the librosa
VQT, CQT, and iCQT algorithms, and they opted not to implement the inverse VQT for good reason. I think we should do the same, at least for now.
Here are parameters that led to decent waveform reconstruction on my end:
SAMPLE_RATE = 16000
HOP_LENGTH = 256
F_MIN = 32.703
N_BINS = 672
BINS_PER_OCTAVE = 96
Increasing the N_BINS
and BINS_PER_OCTAVE
accordingly increases CQT resolution, and by extension the reconstruction is much better 🙂
I don't really have a good answer for this. Probably the result of the set of parameters you're using..?
Hey everyone,
I am happy to propose the addition of the
CQT
,iCQT
, andVQT
. The first two have been requested by issue 588. Since the CQT is a VQT with parametergamma=0
, I figured the VQT should be added to the package too. It also figures quite prominently in the research community, even as a time-frequency representation for neural networks. Here are a few important details.General
The proposed transforms follow and test against the librosa implementations. Note that, since the algorithms are based on recursive sub-sampling, the results between the proposed transforms and
librosa
gradually diverge as the number of resampling iterations increases; the resampling algorithms differ. Thelibrosa
comparison test thresholds are adapted as such. The implementation being matched is the following:The
<ARGUMENTS>
(similar throughout all three transforms) are the controllable ones in the proposed code . The others are "hard-coded". In my opinion, they should stay that way to avoid unnecessary complexity. Future iterations of the transform could incorporate some of these arguments however, if requested by the community!Tests
I was unable to make the transforms
torch-scriptable
. Maybe this should be the focus of a future PR. For the rest, I was able to test on CPU but not GPU for installation reasons. Feel free to let me know if any are lacking.Speed
On the audio snippet from here, over 100 iterations, with
dtype=torch.float64
:Sanity Check
Here's an image of the CQT-gram generated using the following parameters:
The results are pretty much identical! Feel free to request changes or ask me any questions on this PR. I'll be happy to answer, and am excited to get these transforms to the package 🫡