seorim0 / DNN-based-Speech-Enhancement-in-the-frequency-domain

DNN-based SE in the frequency domain using Pytorch. You can test some state-of-the-art networks using T-F masking or spectral mapping method.
MIT License
50 stars 14 forks source link

如何将wav文件转为numpy? #2

Closed Chriszhangmw closed 2 years ago

Chriszhangmw commented 2 years ago

你好,我看到你的dataloader里用torch直接导入np文件,但是数据的generation部分是直接生成的wav文件,可以分享一下如何将wav转为npy文件的过程吗? 感谢

Chriszhangmw commented 2 years ago

当然从librosa.load是可以得到一个array,可以保存下来,但是不同的wav通过load之后得到的np.array维度不一致,所以想看看你是怎么处理的?我是初学者,希望博主可以指点一下,感谢

seorim0 commented 2 years ago

@Chriszhangmw

sorry for the late reply
The process of converting wav files to numpy array is as follows:

speech_dataset = []

# read wav files
for addr_speech in noisy_speech_list:
    noisy_speech, fs = soundfile.read(addr_speech[0])
    if fs != cfg.fs:
        noisy_speech = librosa.resample(noisy_speech, fs, cfg.fs)
    clean_speech, fs = soundfile.read(addr_speech[1])
    if fs != cfg.fs:
        clean_speech = librosa.resample(clean_speech, fs, cfg.fs)
    speech_dataset.append([noisy_speech, clean_speech])

And normalize the speech_dataset.
I hope this answer was helpful :)

seorim0 commented 2 years ago

当然从librosa.load是可以得到一个array,可以保存下来,但是不同的wav通过load之后得到的np.array维度不一致,所以想看看你是怎么处理的?我是初学者,希望博主可以指点一下,感谢

Additionally, without making .np, you can directly use wav files as follows:

Part of dataloader.py

class Wave_Dataset(Dataset):
    def __init__(self, mode):
        # load data
        if mode == 'train':
            print('<Training dataset>')
            print('Load the data...')
            # load the wav addr
            self.noisy_dirs = scan_directory(cfg.noisy_dirs_for_train) 
            self.clean_dirs = find_pair(self.noisy_dirs)

        elif mode == 'valid':
            print('<Validation dataset>')
            print('Load the data...')
            # load the wav addr
            self.noisy_dirs = scan_directory(cfg.noisy_dirs_for_valid)
            self.clean_dirs = find_pair(self.noisy_dirs)

    def __len__(self):
        return len(self.noisy_dirs)

    def __getitem__(self, idx):
        # read the wav
        inputs = addr2wav(self.noisy_dirs[idx])
        targets = addr2wav(self.clean_dirs[idx])

        # transform to torch from numpy
        inputs = torch.from_numpy(inputs)
        targets = torch.from_numpy(targets)

        return inputs, targets

The difference is whether the wav file is read each time it is used or whether it is read ahead.