p0p4k / vits2_pytorch

unofficial vits2-TTS implementation in pytorch
https://arxiv.org/abs/2307.16430
MIT License
478 stars 86 forks source link

necessary of adversarial duration predictor #11

Closed WendongGan closed 1 year ago

WendongGan commented 1 year ago

"note - duration predictor is not adversarial yet. In my earlier experiments with VITS-1, I used deterministic duration predictor (no-sdp) and found that it is quite good. So, I am not sure if adversarial duration predictor is necessary. But, I will add it sooner or later if it is necessary. Also, I want to combine parallel tacotron-2 and naturalspeech-1's learnable upsampling layer to remove MAS completely for E2E differentiable model."

On this issue, I would like to give my feedback:

  1. The experience of most of us training VITS is that the rhythm of VITS is relatively simple and the expressive is not enough. So, this problem, I think, is probably from MAS and duration predictor.

  2. So, for now, the adversarial duration predictor is an important improvement over VITS2.

3.deterministic duration predictor (no-sdp) : In the VITS-1 paper, there seems to be an ablation experiment that proves this is not good enough.

  1. StyleTTS2 is better than VITS in rhythmic diversity and expressive force, which can also prove the shortcomings of VITS. https://styletts2.github.io/

  2. So, I think the adversarial duration predictor is necessary. Look forward to you updating the code, and then experiment and discuss with you

p0p4k commented 1 year ago

https://arxiv.org/pdf/2206.12132.pdf Section 2.5

WendongGan commented 1 year ago

It may be that each has its advantages and disadvantages, and the DDP in VITS is slightly worse and may be better trained。

image

(from vits paper)

p0p4k commented 1 year ago

Yes yes, we have the parts ready, I just have to put in train.py. Will do it by weekend. You can still train a nosdp model then transfer learn it into sdp. We can check the performance by our experiments. Also, with papers, the problem is they just give metrics for that particular dataset, sdp might not work good on other datasets; so it is quite variable.

p0p4k commented 1 year ago

Added adversarial duration predictor, try and let me know if any training errors.

isdanni commented 7 months ago

@p0p4k hey I have one question about the "transfer learn it into sdp..." if you don't mind :smiley: do you mean just continue training with SDP using the checkpoints trained with DP?

p0p4k commented 7 months ago

Yes, also I think we can modify code a little to train both dp and sdp together and compare performance simultaneously to save time.

JohnHerry commented 7 months ago

In my experiments, all instantance with DPD (Duration Predictor Discriminator) are failed to continue enough steps, they are stucked. And their syntheiszed results with the stucked checkpoint are no better then that without DPD.

like in the vits2 paper, our training pipeline is: keep training without DPD to about 700K or 800K steps, then continue training with DPD to about 30K steps [mostly stucked after thousands of steps]. we did not make special change to the learning rate when continue, that means, the DPD is trained with initial lr=0.0002, while other parts are continue with schedular-decayed lr on that steps.

p0p4k commented 7 months ago

The dpd is very naive in my implementation and could use some more sophisticated model.

HuuHuy227 commented 7 months ago

The dpd is very naive in my implementation and could use some more sophisticated model.

"The dpd is very naive in my implementation" is that the reason why i training with DurationDiscriminator and got stuck at the first steps. And should we use it.

isdanni commented 7 months ago

The dpd is very naive in my implementation and could use some more sophisticated model.

"The dpd is very naive in my implementation" is that the reason why i training with DurationDiscriminator and got stuck at the first steps. And should we use it.

In my experience use_sdp didn’t stuck, maybe try lowering batch size? There also might be some other issues causing it

JohnHerry commented 7 months ago

The dpd is very naive in my implementation and could use some more sophisticated model.

"The dpd is very naive in my implementation" is that the reason why i training with DurationDiscriminator and got stuck at the first steps. And should we use it.

In my experience use_sdp didn’t stuck, maybe try lowering batch size? There also might be some other issues causing it

Are you training from scratch? or fine-tune based on a pretrained?

HuuHuy227 commented 7 months ago

The dpd is very naive in my implementation and could use some more sophisticated model.

"The dpd is very naive in my implementation" is that the reason why i training with DurationDiscriminator and got stuck at the first steps. And should we use it.

In my experience use_sdp didn’t stuck, maybe try lowering batch size? There also might be some other issues causing it

Are you training from scratch? or fine-tune based on a pretrained?

Yes. I'm training from scratch and got stuck. And should I apply training strategy as you mention above?

JohnHerry commented 7 months ago

The dpd is very naive in my implementation and could use some more sophisticated model.

"The dpd is very naive in my implementation" is that the reason why i training with DurationDiscriminator and got stuck at the first steps. And should we use it.

In my experience use_sdp didn’t stuck, maybe try lowering batch size? There also might be some other issues causing it

Are you training from scratch? or fine-tune based on a pretrained?

Yes. I'm training from scratch and got stuck. And should I apply training strategy as you mention above?

No, nothing difference. The paper did not tell the detail about the DPD at all. you can train without it.

JohnHerry commented 7 months ago

The dpd is very naive in my implementation and could use some more sophisticated model.

"The dpd is very naive in my implementation" is that the reason why i training with DurationDiscriminator and got stuck at the first steps. And should we use it.

In my experience use_sdp didn’t stuck, maybe try lowering batch size? There also might be some other issues causing it

Are you training from scratch? or fine-tune based on a pretrained?

Yes. I'm training from scratch and got stuck. And should I apply training strategy as you mention above?

No, nothing difference. The paper did not tell the detail about the DPD at all. you can train without it.