haoxiangsnr / Wave-U-Net-for-Speech-Enhancement

Implement Wave-U-Net by PyTorch, and migrate it to the speech enhancement.
https://arxiv.org/abs/1806.03185
MIT License
323 stars 66 forks source link

Assertion Error t len(mixture) == len(clean) == len(enhanced) #7

Open mnabihali opened 4 years ago

mnabihali commented 4 years ago

when I am trying to run the code it gives me an error in this condition assert len(mixture) == len(clean) == len(enhanced) I printed the len of each and found the len of enhanced and clean is equal but len of the mixture is greater than both.

I hope you can help me as soon as possible

Lerry123 commented 4 years ago

Hello, have you solved your problem?I have the same problem with you.

mnabihali commented 4 years ago

No, but I remove this assertion and the required scores I calculated it manually (signal by signal comparison this will not raise an error) if you find another solution please tell me.

On Friday, July 17, 2020, Lerry123 notifications@github.com wrote:

Hello, have you solved your problem?I have the same problem with you.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/haoxiangsnr/Wave-U-Net-for-Speech-Enhancement/issues/7#issuecomment-659956915, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANWBGBEALJCEKHEKVHUCQDDR4ADMJANCNFSM4NNNL46A .

-- Mohamed Nabih Ali *Assistant *Lecturer Faculty of Computers and IT Egyptian E-Learning University Ain Shams Center Mail: mohmed.nabih@gmail.com mohmed.nabih@gmail.com Mobile: +201285659213

Work: 02-33318417

Lerry123 commented 4 years ago

Are you Chinese? We can chat on qq. My English is not good.

Lerry123 commented 4 years ago

I find this code do the padding in the mixture,but the clean and enhancement don't do the padding. Which database do you use and how about the result?

mnabihali commented 4 years ago

Dear, Sorry I am not Chinese. Regarding the dataset I am using VCTK dataset and a noisy version of librispeech dataset.

Sorry Can I ask which part is doning padding for mixture?

Thanks

You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/haoxiangsnr/Wave-U-Net-for-Speech-Enhancement/issues/7#issuecomment-660410721, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANWBGBALDNQ6YC2AOEM5US3R4EC5ZANCNFSM4NNNL46A .

-- Mohamed Nabih Ali *Assistant *Lecturer Faculty of Computers and IT Egyptian E-Learning University Ain Shams Center Mail: mohmed.nabih@gmail.com mohmed.nabih@gmail.com Mobile: +201285659213

Work: 02-33318417

mnabihali commented 4 years ago

Do you find another solution for the problem?

On Saturday, July 18, 2020, Mohamed Nabih mohmed.nabih@gmail.com wrote:

Dear, Sorry I am not Chinese. Regarding the dataset I am using VCTK dataset and a noisy version of librispeech dataset.

Sorry Can I ask which part is doning padding for mixture?

Thanks

You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/haoxiangsnr/Wave-U-Net-for-Speech-Enhancement/issues/7#issuecomment-660410721, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANWBGBALDNQ6YC2AOEM5US3R4EC5ZANCNFSM4NNNL46A .

-- Mohamed Nabih Ali *Assistant *Lecturer Faculty of Computers and IT Egyptian E-Learning University Ain Shams Center Mail: mohmed.nabih@gmail.com mohmed.nabih@gmail.com Mobile: +201285659213

Work: 02-33318417

-- Mohamed Nabih Ali *Assistant *Lecturer Faculty of Computers and IT Egyptian E-Learning University Ain Shams Center Mail: mohmed.nabih@gmail.com mohmed.nabih@gmail.com Mobile: +201285659213

Work: 02-33318417

Lerry123 commented 4 years ago

I am also using the VCTK database,I haven't found a solution yet.If I find the solution,I will tell you.

Lerry123 commented 4 years ago

I have padded for the clean、enhanced and the mixture and this problem is solved,but I have the new problem. When computed STOI, it 's error.The detail is as follow. AttributeError: module 'numpy' has no attribute 'gcd' I haven't found the solution and I only computed the PESQ.

Metric

        #stoi_c_n.append(compute_STOI(clean, mixture, sr=16000))
        #stoi_c_e.append(compute_STOI(clean, enhanced, sr=16000))
        pesq_c_n.append(compute_PESQ(clean, mixture, sr=16000))
        pesq_c_e.append(compute_PESQ(clean, enhanced, sr=16000))

Did you have the same problem?

mnabihali commented 4 years ago

Can you provide me the code how you padding the signals in order to try computing the Stoi

On Monday, July 20, 2020, Lerry123 notifications@github.com wrote:

I have padded for the clean、enhanced and the mixture and this problem is solved,but I have the new problem. When computed STOI, it 's error.The detail is as follow. AttributeError: module 'numpy' has no attribute 'gcd' I haven't found the solution and I only computed the PESQ.

Metric

stoi_c_n.append(compute_STOI(clean, mixture, sr=16000))

stoi_c_e.append(compute_STOI(clean, enhanced, sr=16000))

pesq_c_n.append(compute_PESQ(clean, mixture, sr=16000)) pesq_c_e.append(compute_PESQ(clean, enhanced, sr=16000)) Did you have the same problem?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/haoxiangsnr/Wave-U-Net-for-Speech-Enhancement/issues/7#issuecomment-660870481, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANWBGBGK3Y57CFGKMS6C6F3R4P23FANCNFSM4NNNL46A .

-- Mohamed Nabih Ali *Assistant *Lecturer Faculty of Computers and IT Egyptian E-Learning University Ain Shams Center Mail: mohmed.nabih@gmail.com mohmed.nabih@gmail.com Mobile: +201285659213

Work: 02-33318417

mnabihali commented 4 years ago

@Lerry123 Can you tell me how you pad the signals, and I can try to solve the Numpy issue.

Thanks

Lerry123 commented 4 years ago

trainer.py:

for i, (mixture, clean, name) in enumerate(self.validation_data_loader): assert len(name) == 1, "Only support batch size is 1 in enhancement stage." name = name[0] padded_length = 0

print("len(mixture):",len( mixture.cpu().numpy().reshape(-1)))

        mixture = mixture.to(self.device)  # [1, 1, T]
        clean = clean.to(self.device)
        # The input of the model should be fixed length.

        if mixture.size(-1) % sample_length != 0:
            #print("mixture.size(-1):",mixture.size(-1))
            padded_length = sample_length - (mixture.size(-1) % sample_length)
            #print("padded_length:",padded_length)    
            mixture = torch.cat([mixture, torch.zeros(1, 1, padded_length, device=self.device)], dim=-1)
            clean = torch.cat([clean, torch.zeros(1, 1, padded_length, device=self.device)], dim=-1)
        #print("len(mixture):",len( mixture.cpu().numpy().reshape(-1)))
        #print("mixture.size(-1) % sample_length:",mixture.size(-1) % sample_length)
        #print("mixture.dim():",mixture.dim())
        assert mixture.size(-1) % sample_length == 0 and mixture.dim() == 3
        mixture_chunks = list(torch.split(mixture, sample_length, dim=-1))
        #print("mixture_chunks:",mixture_chunks)    
        enhanced_chunks = []
        for chunk in mixture_chunks:
            enhanced_chunks.append(self.model(chunk).detach().cpu())
        enhanced = torch.cat(enhanced_chunks, dim=-1)  # [1, 1, T]
        enhanced = enhanced.to(self.device)
        '''
        print("len(enhanced):",len(enhanced.cpu().reshape(-1).numpy()))
        print("padded_length:",padded_length)
        '''
        enhanced = enhanced 
        if padded_length == 0:
            enhanced = enhanced 
        else:

            enhanced = torch.cat([enhanced, torch.zeros(1, 1, padded_length, device=self.device)], dim=-1)
            enhanced=enhanced[:, :, :-padded_length]
        #print("len(enhanced):",len(enhanced.cpu().reshape(-1).numpy())) 
        enhanced = enhanced.cpu().reshape(-1).numpy()
        clean = clean.cpu().numpy().reshape(-1)   
        mixture = mixture.cpu().numpy().reshape(-1)
mnabihali commented 4 years ago

Thanks

Regarding the AttributeError: module 'NumPy' has no attribute 'gcd'

It is available on Numpy version 1.15.0. So, check the NumPy version and back to me if not solved.

On Tue, Jul 21, 2020 at 3:18 AM Lerry123 notifications@github.com wrote:

trainer.py:

for i, (mixture, clean, name) in enumerate(self.validation_data_loader): assert len(name) == 1, "Only support batch size is 1 in enhancement stage." name = name[0] padded_length = 0

print("len(mixture):",len( mixture.cpu().numpy().reshape(-1)))

mixture = mixture.to(self.device) # [1, 1, T] clean = clean.to(self.device)

The input of the model should be fixed length.

    if mixture.size(-1) % sample_length != 0:
        #print("mixture.size(-1):",mixture.size(-1))
        padded_length = sample_length - (mixture.size(-1) % sample_length)
        #print("padded_length:",padded_length)
        mixture = torch.cat([mixture, torch.zeros(1, 1, padded_length, device=self.device)], dim=-1)
        clean = torch.cat([clean, torch.zeros(1, 1, padded_length, device=self.device)], dim=-1)
    #print("len(mixture):",len( mixture.cpu().numpy().reshape(-1)))
    #print("mixture.size(-1) % sample_length:",mixture.size(-1) % sample_length)
    #print("mixture.dim():",mixture.dim())
    assert mixture.size(-1) % sample_length == 0 and mixture.dim() == 3
    mixture_chunks = list(torch.split(mixture, sample_length, dim=-1))
    #print("mixture_chunks:",mixture_chunks)
    enhanced_chunks = []
    for chunk in mixture_chunks:
        enhanced_chunks.append(self.model(chunk).detach().cpu())
    enhanced = torch.cat(enhanced_chunks, dim=-1)  # [1, 1, T]
    enhanced = enhanced.to(self.device)
    '''
    print("len(enhanced):",len(enhanced.cpu().reshape(-1).numpy()))
    print("padded_length:",padded_length)
    '''
    enhanced = enhanced
    if padded_length == 0:
        enhanced = enhanced
    else:

        enhanced = torch.cat([enhanced, torch.zeros(1, 1, padded_length, device=self.device)], dim=-1)
        enhanced=enhanced[:, :, :-padded_length]
    #print("len(enhanced):",len(enhanced.cpu().reshape(-1).numpy()))
    enhanced = enhanced.cpu().reshape(-1).numpy()
    clean = clean.cpu().numpy().reshape(-1)
    mixture = mixture.cpu().numpy().reshape(-1)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/haoxiangsnr/Wave-U-Net-for-Speech-Enhancement/issues/7#issuecomment-661526116, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANWBGBGQPWLPFNFPN5V6QADR4TUEXANCNFSM4NNNL46A .

-- Mohamed Nabih Ali *Assistant *Lecturer Faculty of Computers and IT Egyptian E-Learning University Ain Shams Center Mail: mohmed.nabih@gmail.com mohmed.nabih@gmail.com Mobile: +201285659213

Work: 02-33318417

Lerry123 commented 4 years ago

Thank you! It was solved by your suggestion.

mnabihali commented 4 years ago

Thanks Hope everything will be fine, can you provide me with your email to contact you for further problems

On Wednesday, July 22, 2020, Lerry123 notifications@github.com wrote:

Thank you! It was solved by your suggestion.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/haoxiangsnr/Wave-U-Net-for-Speech-Enhancement/issues/7#issuecomment-662331904, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANWBGBEDD2QNFCRZMTGZSWTR42SQ5ANCNFSM4NNNL46A .

-- Mohamed Nabih Ali *Assistant *Lecturer Faculty of Computers and IT Egyptian E-Learning University Ain Shams Center Mail: mohmed.nabih@gmail.com mohmed.nabih@gmail.com Mobile: +201285659213

Work: 02-33318417

Lerry123 commented 4 years ago

Can I get your email ? Have you done the test? My result is very confusing.

diff7 commented 4 years ago

I think the problem is here: https://github.com/haoxiangsnr/Wave-U-Net-for-Speech-Enhancement/blob/c8c9d8945959ba8c3aa1e7cb18cddc10dbc52210/trainer/trainer.py#L78

should be:


if padded_length != 0:                                                            
     enhanced = enhanced[:,:,:-padded_length]         
     mixture = mixture[:,:,:-padded_length] ```
Lerry123 commented 4 years ago

That‘s the problem,as you said. When I use VCTK database,the test result is so bad and the speech is distorted.

diff7 commented 4 years ago

@Lerry123 did you try on the same dataset? And what are advantages of training on VTCK?

I am training on the same dataset, 500 epochs so far and the quality is not great. PESQ is quite low, 1.75, STIO is 0.85 and yes, the sound is distorted but I will tweak some parameters, let's see if it gives a boost.

Lerry123 commented 4 years ago

I have set the sr=16000 in waveform_dataset.py and waveform_dataset_enhancement.py, the PESQ is 2.63.The result is best.

waveform_dataset.py:

line65: mixture, _ = librosa.load(os.path.abspath(os.path.expanduser(mixturepath)), sr=16000) line 66: clean, = librosa.load(os.path.abspath(os.path.expanduser(clean_path)), sr=16000)

diff7 commented 4 years ago

@Lerry123 got it, thanks! PESQ = 2.63, Is it with VTCK?

Lerry123 commented 4 years ago

Yes

meisanhai commented 2 years ago

请问你是怎么更改的参数呀,我用原论文里面与SEGAN相同的数据集,训练出来的结果声音严重失真啊,呜呜呜,怎么回事能帮忙解答一下吗,感谢@Lerry123@diff7

mnabihali commented 2 years ago

According to one answer this could be the solution

I think problem is here: https://github.com/haoxiangsnr/Wave-U-Net-for-Speech-Enhancement/blob/c8c9d8945959ba8c3aa1e7cb18cddc10dbc52210/trainer/trainer.py#L77

should be:

if padded_length != 0: enhanced = enhanced[:,:,:-padded_length] mixture = mixture[:,:,:-padded_length]

On Sat, 1 Jan 2022 at 11:49 AM meisanhai @.***> wrote:

请问你是怎么更改的参数呀,我用原论文里面与SEGAN相同的数据集,训练出来的结果声音严重失真啊,呜呜呜,怎么回事能帮忙解答一下吗,感谢

— Reply to this email directly, view it on GitHub https://github.com/haoxiangsnr/Wave-U-Net-for-Speech-Enhancement/issues/7#issuecomment-1003540048, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANWBGBCF453AIL7GXDKZ2QDUT3L4JANCNFSM4NNNL46A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.*** com>

-- Mohamed Nabih Ali *Assistant Lecturer Faculty of Computers and IT Egyptian E-Learning University Ain Shams Center Mail: **@. @.> Mobile: +201285659213*

Work: 02-33318417

mnabihali commented 2 years ago

Sorry for the previous email, you could play with the parameters, and check. Thanks

On Sat, Jan 1, 2022 at 11:58 AM Mohamed Nabih @.***> wrote:

According to one answer this could be the solution

I think problem is here:

https://github.com/haoxiangsnr/Wave-U-Net-for-Speech-Enhancement/blob/c8c9d8945959ba8c3aa1e7cb18cddc10dbc52210/trainer/trainer.py#L77

should be:

if padded_length != 0: enhanced = enhanced[:,:,:-padded_length] mixture = mixture[:,:,:-padded_length]

On Sat, 1 Jan 2022 at 11:49 AM meisanhai @.***> wrote:

请问你是怎么更改的参数呀,我用原论文里面与SEGAN相同的数据集,训练出来的结果声音严重失真啊,呜呜呜,怎么回事能帮忙解答一下吗,感谢

— Reply to this email directly, view it on GitHub https://github.com/haoxiangsnr/Wave-U-Net-for-Speech-Enhancement/issues/7#issuecomment-1003540048, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANWBGBCF453AIL7GXDKZ2QDUT3L4JANCNFSM4NNNL46A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.*** .com>

-- Mohamed Nabih Ali *Assistant Lecturer Faculty of Computers and IT Egyptian E-Learning University Ain Shams Center Mail: **@. @.> Mobile: +201285659213*

Work: 02-33318417

-- Mohamed Nabih Ali *Assistant Lecturer Faculty of Computers and IT Egyptian E-Learning University Ain Shams Center Mail: **@. @.> Mobile: +201285659213*

Work: 02-33318417

meisanhai commented 2 years ago

非常感谢您的回复。我用https://datashare.ed.ac.uk/handle/10283/1942这个里面的数据集,训练的结果并不好,PESQ=1.35,STOI=0.65。请问您用的是什么数据集呢@mnabihali

mnabihali commented 2 years ago
I applied it to my own dataset which is a noisy version of librispeeh dataset Sent from Mail for Windows From: meisanhaiSent: Saturday, January 1, 2022 12:23 PMTo: haoxiangsnr/Wave-U-Net-for-Speech-EnhancementCc: mnabihali; AuthorSubject: Re: [haoxiangsnr/Wave-U-Net-for-Speech-Enhancement] Assertion Error t len(mixture) == len(clean) == len(enhanced) ***@***.***—Reply to this email directly, view it on GitHub, or unsubscribe.Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: ***@***.***> 
meisanhai commented 2 years ago

mnabihali Thankyou verymuch!我把waveform_dataset.py里面的采样率改为16K,效果变好了(:-|

VaeFlashMe commented 2 years ago

我已经填充了清洁、增强和混合,这个问题解决了,但我有新问题。计算 STOI 时报错。详情如下。 AttributeError: module 'numpy' has no attribute 'gcd' 我没有找到解决方案,我只计算了 PESQ。 # Metric #stoi_c_n.append(compute_STOI(clean, mix, sr=16000)) #stoi_c_e.append(compute_STOI(clean, enhanced, sr=16000)) pesq_c_n.append(compute_PESQ(clean, mix, sr=16000)) pesq_c_e .append(compute_PESQ(clean, enhanced, sr=16000)) 你有同样的问题吗?

您好,请问一下是你怎么解决第十个epcho报错的问题的,我搞了好久没有解决

andyye1999 commented 2 years ago

I think the problem is here:

https://github.com/haoxiangsnr/Wave-U-Net-for-Speech-Enhancement/blob/c8c9d8945959ba8c3aa1e7cb18cddc10dbc52210/trainer/trainer.py#L78

should be:

if padded_length != 0:                                                            
     enhanced = enhanced[:,:,:-padded_length]         
     mixture = mixture[:,:,:-padded_length] ```

thank you so much

renxuezhang commented 7 months ago

你是中国人吗?我们可以在qq上聊天。我的英语不好。

您好,我现在研二,想复现这个代码做一个创新点。复现中遇到了一些问题,请问您方便帮我看一下吗?可以加个qq交流下吗?