Closed Chriszhangmw closed 2 years ago
当然从librosa.load是可以得到一个array,可以保存下来,但是不同的wav通过load之后得到的np.array维度不一致,所以想看看你是怎么处理的?我是初学者,希望博主可以指点一下,感谢
@Chriszhangmw
sorry for the late reply
The process of converting wav files to numpy array is as follows:
speech_dataset = []
# read wav files
for addr_speech in noisy_speech_list:
noisy_speech, fs = soundfile.read(addr_speech[0])
if fs != cfg.fs:
noisy_speech = librosa.resample(noisy_speech, fs, cfg.fs)
clean_speech, fs = soundfile.read(addr_speech[1])
if fs != cfg.fs:
clean_speech = librosa.resample(clean_speech, fs, cfg.fs)
speech_dataset.append([noisy_speech, clean_speech])
And normalize the speech_dataset.
I hope this answer was helpful :)
当然从librosa.load是可以得到一个array,可以保存下来,但是不同的wav通过load之后得到的np.array维度不一致,所以想看看你是怎么处理的?我是初学者,希望博主可以指点一下,感谢
Additionally, without making .np, you can directly use wav files as follows:
Part of dataloader.py
class Wave_Dataset(Dataset):
def __init__(self, mode):
# load data
if mode == 'train':
print('<Training dataset>')
print('Load the data...')
# load the wav addr
self.noisy_dirs = scan_directory(cfg.noisy_dirs_for_train)
self.clean_dirs = find_pair(self.noisy_dirs)
elif mode == 'valid':
print('<Validation dataset>')
print('Load the data...')
# load the wav addr
self.noisy_dirs = scan_directory(cfg.noisy_dirs_for_valid)
self.clean_dirs = find_pair(self.noisy_dirs)
def __len__(self):
return len(self.noisy_dirs)
def __getitem__(self, idx):
# read the wav
inputs = addr2wav(self.noisy_dirs[idx])
targets = addr2wav(self.clean_dirs[idx])
# transform to torch from numpy
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)
return inputs, targets
The difference is whether the wav file is read each time it is used or whether it is read ahead.
你好,我看到你的dataloader里用torch直接导入np文件,但是数据的generation部分是直接生成的wav文件,可以分享一下如何将wav转为npy文件的过程吗? 感谢