metallic sound when applying phaseaug on avocodo

wblgers commented 1 year ago

Hi,

Thanks for your great work, I try to apply phaseaug on avocodo(a littel different from hifigan on discriminator). But there is metallic sound, now the training steps is around 200k, did you encounter the same problem on hifigan?

Thanks

junjun3518 commented 1 year ago

Hi, wblgers!

We are not sure that we did not try it to Avocodo before, but I think the cause is one of the belows

200k is too early stage since HiFi-GAN needs 2.5M steps.
You did not apply the filter option of PhaseAug, we also encountered metallic sound during the early stage of research and we solve this by low pass filtering (check the last paragraph of the section 2.2 and phaseaug.py's use_filter args).
Avocodo implementation issue (since official implementation is not existing)

(+ Avocodo paper mentioned they trained a model up to 3M steps)

Please give me more information to diagnose your problem, how you applied PhaseAug to your Avocodo, which Avocodo implementation you use, the dataset, etc.

Thank you for your interest! Best regard, Junhyeok Lee

junjun3518 commented 1 year ago

Hi @wblgers, ncsoft released the official Avocodo implementation! https://github.com/ncsoft/avocodo please try it!

wblgers commented 1 year ago

Hi, wblgers!

We are not sure that we did not try it to Avocodo before, but I think the cause is one of the belows

200k is too early stage since HiFi-GAN needs 2.5M steps.

You did not apply the filter option of PhaseAug, we also encountered metallic sound during the early stage of research and we solve this by low pass filtering (check the last paragraph of the section 2.2 and phaseaug.py's use_filter args).

Avocodo implementation issue (since official implementation is not existing)

(+ Avocodo paper mentioned they trained a model up to 3M steps)

Please give me more information to diagnose your problem, how you applied PhaseAug to your Avocodo, which Avocodo implementation you use, the dataset, etc.

Thank you for your interest! Best regard, Junhyeok Lee

Hi @junjun3518 Thanks for your reply, I found the root cause, just as you mentioned, I didn't apply the filter option of PhaseAug, now the metallic sound disappear during the training.

I followed the unofficial implementation of avocodo: https://github.com/rishikksh20/Avocodo-pytorch

I also noticed the official version is released, I'll try PhaseAug on the official avocodo. Thanks again for your work, I'll update my progress if it works!

junjun3518 commented 1 year ago

Hi @wblgers, I am happy to hear that you found a cause.

Now I also try to apply Avocodo disc to VITS with PhaseAug like below. For now, I did not apply phaseaug to hierarchical signals for CoMBD.

                aug_y_, aug_y_hat_last = aug.forward_sync(y_, y_hat_[-1].detach())
                aug_y_hat_ = [_y.detach() for _y in y_hat_[:-1]]
                aug_y_hat_.append(aug_y_hat_last)

I hope to wait for your progress. Merry Christmas.

lexkoro commented 1 year ago

I am also using Avocodo with VITS. Have added phaseaug but I am actually not sure if it has brought any improvements. Why are you skipping the last element?

Given the example code, this is how I have added phaseaug.

          # Discriminator
          if hps.train.use_phaseaug:
              with autocast(enabled=False):
                  aug_y, aug_y_g = phase_aug.forward_sync(y, y_hat.detach())
          else:
              aug_y, aug_y_g = y, y_hat

          y_df_hat_r, y_df_hat_g, _, _ = mcmbd(aug_y, aug_y_g, x2.detach(), x1.detach())
          y_ds_hat_r, y_ds_hat_g, _, _ = msbd(aug_y, aug_y_g)

junjun3518 commented 1 year ago

Hi @lexkoro, since I applied official implementation with some modifications, y_hat_ = [x2, x1, y_hat] and y_=y in your notation, . I only applied forward_sync for y and yhat[-1] and append it as aug_y_hat_last, so it is same as your implementation.

I am not sure about improvements of VITS + Avocodo_disc + PhaseAug, because I also started experiments in last week. For now I think that PhaseAug could reduce periodicity artifacts, but Avocodo disc also reduce it (may be more?, not sure), so it is hard to notice the difference. For VITS + PhaseAug case, we can notice PhaseAug could reduce a occurrence of the artifacts.

PhaseAug have advantages when

a small dataset
need to finetuning from pretrained models
hard to modify a model
small gpu memory

In my experience, the time and memory consumption for a epoch are in the following order: mpd<mpd+PhaseAug<<(combd+sbd)<(combd+sbd)+PhaseAug.

Thank you for your interest and I hope you could feel difference. Have a happy new year.

lexkoro commented 1 year ago

Hey @junjun3518 thank you for the detailed answer.

I will continue to test and report if necessary.

I also wish you a happy new year.

dathudeptrai commented 1 year ago

@junjun3518 Hi and thanks for your great work. I just wonder if you only apply phaseaug for discriminator training in VITS or for generator training as well.

junjun3518 commented 1 year ago

Hi @dathudeptrai!

Differentiable augmentation should be applied at both disc training and generator training. You can check why in DiffAugment, StyleGAN2-ADA.

Briefly, the target of GAN is making $p(G(z))$ to $p(x)$ by discriminator, but usually, samples of x are limited. So we apply DiffAugment $T$ to both of $x$ and $G(z)$ as $T(x)$ and $T(G(z))$, respectively.

I think that example of VITS in README.md confused you. It is an example of autocast case. You should apply PhaseAug for both the disc and the generator update phases like a HiFIGAN example.

Thank you for the great question! Best regard, Junhyeok Lee

junjun3518 commented 1 year ago

Hi there,

Thank you for your questions and for trying to use PhaseAug. It seems that the first reporter's issue is solved, so I will close this issue. If you have any further questions, please feel free to open a new issue.

Best regards, Junhyeok Lee

Selimonder commented 1 year ago

Thank you for your answers @junjun3518!

For now, I did not apply phaseaug to hierarchical signals for CoMBD.

Is there a reason why you didn't apply phase augmentation to the all hierarchical levels?

junjun3518 commented 1 year ago

Hi @Selimonder,

I had a discussion with the author of Avocodo, and he said, "CoMBD's hierarchical discriminating could prevent upsampling artifacts".

After discussing this further, I believe that applying PhaseAug to the final output is sufficient. While it is possible to apply PhaseAug to hierarchical output, I decided not to for two reasons:

it needs additional calculations
- Both of PhaseAug and Avocodo need additional calculations. Since applying both of them is already needed additional additional calculations, I do not want to increase it further.
I'm not convinced it's worth it.
- Applying PhaseAug to the final output already affects all inputs of the discriminators, except hierarchical outputs.
- Our target is to increase the quality of the final output not hierarchicals.
- Hierarchical outputs are already satisfying their worth by preventing upsampling artifacts.

It's an important issue to consider! We appreciate your question.

Best regards, Junhyeok Lee

maum-ai / phaseaug

metallic sound when applying phaseaug on avocodo #5