Closed WendongGan closed 1 year ago
https://arxiv.org/pdf/2206.12132.pdf Section 2.5
It may be that each has its advantages and disadvantages, and the DDP in VITS is slightly worse and may be better trained。
(from vits paper)
Yes yes, we have the parts ready, I just have to put in train.py. Will do it by weekend. You can still train a nosdp model then transfer learn it into sdp. We can check the performance by our experiments. Also, with papers, the problem is they just give metrics for that particular dataset, sdp might not work good on other datasets; so it is quite variable.
Added adversarial duration predictor, try and let me know if any training errors.
@p0p4k hey I have one question about the "transfer learn it into sdp..." if you don't mind :smiley: do you mean just continue training with SDP using the checkpoints trained with DP?
Yes, also I think we can modify code a little to train both dp and sdp together and compare performance simultaneously to save time.
In my experiments, all instantance with DPD (Duration Predictor Discriminator) are failed to continue enough steps, they are stucked. And their syntheiszed results with the stucked checkpoint are no better then that without DPD.
like in the vits2 paper, our training pipeline is: keep training without DPD to about 700K or 800K steps, then continue training with DPD to about 30K steps [mostly stucked after thousands of steps]. we did not make special change to the learning rate when continue, that means, the DPD is trained with initial lr=0.0002, while other parts are continue with schedular-decayed lr on that steps.
The dpd is very naive in my implementation and could use some more sophisticated model.
The dpd is very naive in my implementation and could use some more sophisticated model.
"The dpd is very naive in my implementation" is that the reason why i training with DurationDiscriminator and got stuck at the first steps. And should we use it.
The dpd is very naive in my implementation and could use some more sophisticated model.
"The dpd is very naive in my implementation" is that the reason why i training with DurationDiscriminator and got stuck at the first steps. And should we use it.
In my experience use_sdp didn’t stuck, maybe try lowering batch size? There also might be some other issues causing it
The dpd is very naive in my implementation and could use some more sophisticated model.
"The dpd is very naive in my implementation" is that the reason why i training with DurationDiscriminator and got stuck at the first steps. And should we use it.
In my experience use_sdp didn’t stuck, maybe try lowering batch size? There also might be some other issues causing it
Are you training from scratch? or fine-tune based on a pretrained?
The dpd is very naive in my implementation and could use some more sophisticated model.
"The dpd is very naive in my implementation" is that the reason why i training with DurationDiscriminator and got stuck at the first steps. And should we use it.
In my experience use_sdp didn’t stuck, maybe try lowering batch size? There also might be some other issues causing it
Are you training from scratch? or fine-tune based on a pretrained?
Yes. I'm training from scratch and got stuck. And should I apply training strategy as you mention above?
The dpd is very naive in my implementation and could use some more sophisticated model.
"The dpd is very naive in my implementation" is that the reason why i training with DurationDiscriminator and got stuck at the first steps. And should we use it.
In my experience use_sdp didn’t stuck, maybe try lowering batch size? There also might be some other issues causing it
Are you training from scratch? or fine-tune based on a pretrained?
Yes. I'm training from scratch and got stuck. And should I apply training strategy as you mention above?
No, nothing difference. The paper did not tell the detail about the DPD at all. you can train without it.
The dpd is very naive in my implementation and could use some more sophisticated model.
"The dpd is very naive in my implementation" is that the reason why i training with DurationDiscriminator and got stuck at the first steps. And should we use it.
In my experience use_sdp didn’t stuck, maybe try lowering batch size? There also might be some other issues causing it
Are you training from scratch? or fine-tune based on a pretrained?
Yes. I'm training from scratch and got stuck. And should I apply training strategy as you mention above?
No, nothing difference. The paper did not tell the detail about the DPD at all. you can train without it.
"note - duration predictor is not adversarial yet. In my earlier experiments with VITS-1, I used deterministic duration predictor (no-sdp) and found that it is quite good. So, I am not sure if adversarial duration predictor is necessary. But, I will add it sooner or later if it is necessary. Also, I want to combine parallel tacotron-2 and naturalspeech-1's learnable upsampling layer to remove MAS completely for E2E differentiable model."
On this issue, I would like to give my feedback:
The experience of most of us training VITS is that the rhythm of VITS is relatively simple and the expressive is not enough. So, this problem, I think, is probably from MAS and duration predictor.
So, for now, the adversarial duration predictor is an important improvement over VITS2.
3.deterministic duration predictor (no-sdp) : In the VITS-1 paper, there seems to be an ablation experiment that proves this is not good enough.
StyleTTS2 is better than VITS in rhythmic diversity and expressive force, which can also prove the shortcomings of VITS. https://styletts2.github.io/
So, I think the adversarial duration predictor is necessary. Look forward to you updating the code, and then experiment and discuss with you