如何将wav文件转为numpy？

Chriszhangmw commented 2 years ago

你好，我看到你的dataloader里用torch直接导入np文件，但是数据的generation部分是直接生成的wav文件，可以分享一下如何将wav转为npy文件的过程吗？感谢

Chriszhangmw commented 2 years ago

当然从librosa.load是可以得到一个array，可以保存下来，但是不同的wav通过load之后得到的np.array维度不一致，所以想看看你是怎么处理的？我是初学者，希望博主可以指点一下，感谢

seorim0 commented 2 years ago

@Chriszhangmw

sorry for the late reply
The process of converting wav files to numpy array is as follows:

speech_dataset = []

# read wav files
for addr_speech in noisy_speech_list:
    noisy_speech, fs = soundfile.read(addr_speech[0])
    if fs != cfg.fs:
        noisy_speech = librosa.resample(noisy_speech, fs, cfg.fs)
    clean_speech, fs = soundfile.read(addr_speech[1])
    if fs != cfg.fs:
        clean_speech = librosa.resample(clean_speech, fs, cfg.fs)
    speech_dataset.append([noisy_speech, clean_speech])

And normalize the speech_dataset.
I hope this answer was helpful :)

seorim0 commented 2 years ago

当然从librosa.load是可以得到一个array，可以保存下来，但是不同的wav通过load之后得到的np.array维度不一致，所以想看看你是怎么处理的？我是初学者，希望博主可以指点一下，感谢

Additionally, without making .np, you can directly use wav files as follows:

Part of dataloader.py

class Wave_Dataset(Dataset):
    def __init__(self, mode):
        # load data
        if mode == 'train':
            print('<Training dataset>')
            print('Load the data...')
            # load the wav addr
            self.noisy_dirs = scan_directory(cfg.noisy_dirs_for_train) 
            self.clean_dirs = find_pair(self.noisy_dirs)

        elif mode == 'valid':
            print('<Validation dataset>')
            print('Load the data...')
            # load the wav addr
            self.noisy_dirs = scan_directory(cfg.noisy_dirs_for_valid)
            self.clean_dirs = find_pair(self.noisy_dirs)

    def __len__(self):
        return len(self.noisy_dirs)

    def __getitem__(self, idx):
        # read the wav
        inputs = addr2wav(self.noisy_dirs[idx])
        targets = addr2wav(self.clean_dirs[idx])

        # transform to torch from numpy
        inputs = torch.from_numpy(inputs)
        targets = torch.from_numpy(targets)

        return inputs, targets

The difference is whether the wav file is read each time it is used or whether it is read ahead.

seorim0 / DNN-based-Speech-Enhancement-in-the-frequency-domain

如何将wav文件转为numpy？ #2