Open jishnub opened 7 months ago
You can use any pitch computation algorithm in librosa or yaapt for computing the pitch. There might be minor differences, but it did not affect the generation quality for me. I used pyaapt because it was faster than the librosa library. I don't remember the parameters I used, but using the above code snippet should work just fine.
Let me know if you face any issues by using the above code snippet.
This is being read in at https://github.com/iiscleap/ZEST/blob/255a7e506e777bf47fb4b5266dc1a49dff126d60/code/F0_predictor/config.py#L10 https://github.com/iiscleap/ZEST/blob/255a7e506e777bf47fb4b5266dc1a49dff126d60/code/F0_predictor/pitch_attention_adv.py#L53-L54 and this appears to contain a dictionary of the form
filename: vector
. How are these vectors generated? I think these would need to be regenerated when using audio files not present in the original dataset?Reading the paper, these are probably generated using the YAAPT algorithm? Could you point to an existing code for this?
Is this generated using the
get_yaapt_f0
function? https://github.com/iiscleap/ZEST/blob/255a7e506e777bf47fb4b5266dc1a49dff126d60/code/HiFi-GAN/dataset.py#L27-L43The results seem a bit different.