xcmyz / FastVocoder

Include Basis-MelGAN, MelGAN, HifiGAN and Multiband-HifiGAN, maybe NHV in the future.
MIT License
154 stars 19 forks source link

Multiband Architecture #5

Open Rongjiehuang opened 3 years ago

Rongjiehuang commented 3 years ago

Hi author, I have found the notes as "the generated audio has interference at a specific frequency" in this repo. I have encountered with the straight line at a specific frequency when developing similar multiband architecture, and I wonder if such phenomenon is the one you mentioned? And do you have some advice or solutions? Thanks. audio

xcmyz commented 3 years ago

You can refer https://github.com/xcmyz/FastVocoder/blob/main/bin/synthesize.py#L79

Rongjiehuang commented 3 years ago

hi, I try and find that the trick could not solve this problem. Because of the random value of synthesized sound in two synthesis, this minus could be "over". E.g., in some place a clearer segment (0.02, 0.05, 0.06) - a bias (0.05, 0.05, 0.02) = (-0.03, 0, 0.04), which means that the first place gets worse.

xcmyz commented 3 years ago

hi, I try and find that the trick could not solve this problem. Because of the random value of synthesized sound in two synthesis, this minus could be "over". E.g., in some place a clearer segment (0.02, 0.05, 0.06) - a bias (0.05, 0.05, 0.02) = (-0.03, 0, 0.04), which means that the first place gets worse.

In my case, it can solve the checkerboard artifacts problem. Maybe you can use some low-quality speech to train the model, like aishell3. I combine biaobei data and aishell3 in the training data, this problem can be solved. Besides, you can try u-law algorithm in different band and make normalization in different band to fix the problem.

RuqiaoLiu commented 3 years ago

Hi author, I have found the notes as "the generated audio has interference at a specific frequency" in this repo. I have encountered with the straight line at a specific frequency when developing similar multiband architecture, and I wonder if such phenomenon is the one you mentioned? And do you have some advice or solutions? Thanks. audio

Hi, I also have encountered with the straight line at a specific frequency when developing similar multiband architecture.for example multiband Mel-Gan.Do you have the trick to solve now?

Rongjiehuang commented 3 years ago

Hi author, I have found the notes as "the generated audio has interference at a specific frequency" in this repo. I have encountered with the straight line at a specific frequency when developing similar multiband architecture, and I wonder if such phenomenon is the one you mentioned? And do you have some advice or solutions? Thanks. audio

Hi, I also have encountered with the straight line at a specific frequency when developing similar multiband architecture.for example multiband Mel-Gan.Do you have the trick to solve now?

There are three main general approaches for these constant lines:

  1. train for more steps.
  2. add discriminator (work in GAN based waveform generation)
  3. after PQMF, the full band waveforms pass through an additional conv layer.
ysujiang commented 2 years ago

Is better than hifigan??

HaiFengZeng commented 1 year ago

@Rongjiehuang Thanks,the last advice works for me!