Closed timprepscius closed 1 year ago
Hi Tim, this is not really a code issue, but let me attempt an explanation nonetheless: The frequency step (analysis or synthesis) is an implicit variable that can be explained as follows: We think about the (full/undecimated) STFT of a signal of length L as the LxL matrix we get by choosing a hop size of 1 and computing a full-length FFT at each time position. In practice, we decimate that matrix (only compute it for some values), by choosing a hop size a larger than 1 and a number of frequency channels M. Hence, the number of considered time steps N and the frequency step b are determined implicitly as L/a and L/M, respectively.
In time stretching, we map a signal of length L_a to another signal of different length L_s, hence the frequency steps are different. To perform the correct phase modification, the algorithm needs to be aware of the ratio between the two frequency steps. If you look at the code snippet you sent, then you can see that, in fact, only the ratio between analysis and synthesis frequency step is ever used.
If you have more questions about the PVDR, but not not about phaseret directly, please send me an e-mail instead of opening an issue in the toolbox repository. I'm the second author on the paper and the e-mail address is still valid.
Nicki
Hey there,
I'm about to ask a non-well-formed question. I apologize in advance. If it doesn't make sense, or etc, ignore as you wish. But here goes:
So I'm porting a rust implementation of PVDoneRight, mostly because I don't want to look at any GPL code. This rust implementation is really clear and follows your paper.
However there is one variable that doesn't make much sense to me:
analysis_frequency_step
analysis_frequency_step
is calculated by taking the input_length and dividing it by the fft size like so: (rust code)let analysis_frequency_step = input_len as f64 / fft_size as f64
(this just says, analysis_frequency_step = input_len / fft_size (but use floats))Then later this
analysis_frequency_step
is used while calculating deltas-- for instance:in this example
i
corresponds to the frame index andj
corresponds to fft frequency value. Essentially he is writing, "if there is no necessary surrounding frames then just use frequency_delta_phi, but if there are- calculate the frequency_forward_delta_phi using phases of the previous frames"Here is my question: Since the analysis hopsize (the hop between the fft on the input data) is not dependent of the input length and the synthesis hopsize (the hop between the fft on the output data) is not dependent on the input length, shouldn't the analysis_frequency_step also be independent of the input length?
Shouldn't
analysis_frequency_step
be somehow a scalar multiplied by synthesis_hop_size?If you have any hints or insights, I would appreciate. I realize this is not the code you wrote, however I believe I've summarized what's happening - and as you are the world expert you can probably understand instantly more than I.
Thanks in advance.
-tim
The rust implementation can be found here: https://github.com/Hajime-san/phase-gradient-vocoder/blob/main/src/main.rs