Preventing f0-flipping for crossgender VC
Strong obsequencey
技術と手法のキモはどこ?
Extract log-f0 and quantize the range 0~1 into 256 bins and uses as one-hot input to decoder
どうやって有効だと検証した?
After crossgender VC, by plotting f0-distribution, they found that the f0-dist of converted voice overlaps that of the target speaker, and there was no peak centered at the f0 of a different gender.
They did MOS test and got 3.732 for quality, and 3.331 for similarity while a basic AutoVC got 3.546, 3.076 respectively.
リンク
https://arxiv.org/abs/2004.07370
どんなもの?
Appending f0-conditioned input to VAE's decocder
先行研究と比べてどこがすごい?
Preventing f0-flipping for crossgender VC Strong obsequencey
技術と手法のキモはどこ?
Extract log-f0 and quantize the range 0~1 into 256 bins and uses as one-hot input to decoder
どうやって有効だと検証した?
After crossgender VC, by plotting f0-distribution, they found that the f0-dist of converted voice overlaps that of the target speaker, and there was no peak centered at the f0 of a different gender. They did MOS test and got 3.732 for quality, and 3.331 for similarity while a basic AutoVC got 3.546, 3.076 respectively.
議論はある?
次に読むべき論文