Why are you using "use_mel_posterior_encoder"?

p0p4k / vits2_pytorch

unofficial vits2-TTS implementation in pytorch

https://arxiv.org/abs/2307.16430

MIT License

465 stars 81 forks source link

Why are you using "use_mel_posterior_encoder"? #89

Closed Moon-sung-woo closed 2 months ago

Moon-sung-woo commented 2 months ago

Hi I'm sungwoo Moon. First of all, thank you for your sharing your code.

I'm looking at your code and I'm wondering why you use 'use_mel_posterior_encoder'. In the paper vits1, it says that we use spectrogram like vits1, but I wonder if there is a difference in TTS performance.

Thank you.

p0p4k commented 2 months ago

In paper they said they use Mel spec, while vits1 uses linear spec. Am I missing something? 🧐

lastapple commented 2 months ago

In paper they said they use Mel spec, while vits1 uses linear spec. Am I missing something? 🧐

So is there a big difference between the effects of these two conditions?

p0p4k commented 2 months ago

I think better to use linear spectrogram. But it's not a big deal

Moon-sung-woo commented 2 months ago

@p0p4k Oh! i'm sorry. I read the paper wrong. Thank you