supikiti / Awesome-tts-and-vc

Here is a summary of the conference papers we read.
7 stars 0 forks source link

F0-CONSYSTENT MANY-TO-MANY VOICE CONVERSION VIA CONDITIONAL AUTOENCODER #8

Open zbller opened 4 years ago

zbller commented 4 years ago

リンク

https://arxiv.org/abs/2004.07370

どんなもの?

Appending f0-conditioned input to VAE's decocder

先行研究と比べてどこがすごい?

Preventing f0-flipping for crossgender VC Strong obsequencey

技術と手法のキモはどこ?

Extract log-f0 and quantize the range 0~1 into 256 bins and uses as one-hot input to decoder

どうやって有効だと検証した?

After crossgender VC, by plotting f0-distribution, they found that the f0-dist of converted voice overlaps that of the target speaker, and there was no peak centered at the f0 of a different gender. They did MOS test and got 3.732 for quality, and 3.331 for similarity while a basic AutoVC got 3.546, 3.076 respectively.

議論はある?

次に読むべき論文