lmnt-com / diffwave

DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
Apache License 2.0
767 stars 112 forks source link

Code for evaluation in paper #32

Open v-nhandt21 opened 2 years ago

v-nhandt21 commented 2 years ago

I found some automatic evaluation metrics mentioned in the paper, where can I find these scripts so that I can reproduce the result and compare with others method.

image

sharvil commented 2 years ago

There's no implementation of these metrics in this repository. A code contribution here would be really helpful!

zzw-zwzhang commented 2 years ago

I found some automatic evaluation metrics mentioned in the paper, where can I find these scripts so that I can reproduce the result and compare with others method.

image

Have you solved this problem?

zzw-zwzhang commented 2 years ago

Could you provide the evaluation code? Thank you!

v-nhandt21 commented 2 years ago

For FID, I find an implement from torchmetric: https://torchmetrics.readthedocs.io/en/stable/image/frechet_inception_distance.html

But it required input much be in dtype=torch.unit8, while spectrogram I generate is in float, if I try to convert to int, get an error: ZeroDivisionError: float division by zero

def evaluation(mel_infer, mel_gt):
     mel_infer = torch.from_numpy(mel_infer).to("cuda").unsqueeze(0).unsqueeze(0).to(torch.uint8).repeat(1,3,1,1)
     mel_gt = torch.from_numpy(mel_gt).to("cuda").unsqueeze(0).unsqueeze(0).to(torch.uint8).repeat(1,3,1,1)

     mel_infer = torch.clamp(mel_infer, min=1e-5, max=1e-5)
     mel_gt = torch.clamp(mel_gt, min=1e-5, max=1e-5)

     fid = FrechetInceptionDistance(feature=2048).to("cuda")
     fid.update(mel_infer, real=False)
     fid.update(mel_gt, real=True)
     f = fid.compute()

     return f
gzhu06 commented 2 years ago

You can try this:https://github.com/gzhu06/Unconditional-Audio-Generation-Benchmark

FlyingCan commented 1 year ago

Hello,I want to ask a sound stupid question, How can I compute the FID value of the test dataset and train dataset? both of them are real data distribution. for example, If I want to compute the FID value of train dataset, I will input compute_fid_function(train_dataset,train_dataset)?