hhguo / EA-SVC

An implement of "Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training"
MIT License
124 stars 33 forks source link

Great work! How to make PPG features? #1

Open c1a1o1 opened 4 years ago

c1a1o1 commented 4 years ago

Great work! How to make PPG features? Speaker embedding? F0 features?

hhguo commented 3 years ago

Thanks for your attention. You can use ASR tools to extract the hidden vector in its model as the PPG, extract X-Vector as the speaker embedding, and extract F0 using WORLD or REAPER.

jun-danieloh commented 3 years ago

@hhguo Thanks for your reply! Can you add more details regarding PPG features? What kind of ASR tool did you use and what hidden vectors are you talking about?

hhguo commented 3 years ago

It really depends on your ASR model. Usually, we adopt the bottleneck features in the ASR model or the output of the softmax layer. You need to compare them, and find the most generalized / robust one as the PPG feature.

jun-danieloh commented 3 years ago

@hhguo Can I ask which ASR model did you exploit? Isn't it from Kaldi?

MaxGodTier commented 3 years ago

@hhguo If possible, could you show us a step-to-step example how to train from scratch using a tiny portion from an existing dataset? No need for it to sound good, just for verifying whether the code works or not, that would be of immense help to understand how it works. Thank you.

hhguo commented 3 years ago

Due to the un-public data, I didn't upload the detailed example. When I get available resources, I will provide it in the next version.

c1a1o1 commented 3 years ago

@hhguo Can you do a test on NUS-48E Sung and Spoken Lyrics Corpus dataset? https://smcnus.comp.nus.edu.sg/nus-48e-sung-and-spoken-lyrics-corpus/