Open c1a1o1 opened 4 years ago
Thanks for your attention. You can use ASR tools to extract the hidden vector in its model as the PPG, extract X-Vector as the speaker embedding, and extract F0 using WORLD or REAPER.
@hhguo Thanks for your reply! Can you add more details regarding PPG features? What kind of ASR tool did you use and what hidden vectors are you talking about?
It really depends on your ASR model. Usually, we adopt the bottleneck features in the ASR model or the output of the softmax layer. You need to compare them, and find the most generalized / robust one as the PPG feature.
@hhguo Can I ask which ASR model did you exploit? Isn't it from Kaldi?
@hhguo If possible, could you show us a step-to-step example how to train from scratch using a tiny portion from an existing dataset? No need for it to sound good, just for verifying whether the code works or not, that would be of immense help to understand how it works. Thank you.
Due to the un-public data, I didn't upload the detailed example. When I get available resources, I will provide it in the next version.
@hhguo Can you do a test on NUS-48E Sung and Spoken Lyrics Corpus dataset? https://smcnus.comp.nus.edu.sg/nus-48e-sung-and-spoken-lyrics-corpus/
Great work! How to make PPG features? Speaker embedding? F0 features?