memory error报错 - Githubissues

ZZZJane commented 6 years ago

您好，程序在小数据量下运行可行。但当增大数据量后，出现了内存报错。 (注：实验使用了4000条纯净语音和10种噪声0db混合作为训练集，即40000条数据)

①在pack_features过程临时存储超内存报错，我改用分批量追加到.h5文件，避免了临时存储结构的内存报错。但在train阶段，要从data.h5文件中读取所有的 x 放到一个np.array中，然后随机生成batch。由于内存问题，读取到array的过程又会报错。我理解的是，内存报错都是在临时存储数据时，能不能跳过这一步，直接从data.h5文件中随机生成batch？或者您是如何处理大量数据的读取存储问题的？

②在prepare_data.py的compute_scaler函数中，data.h5文件中读取出来的数据存入np.array时报内存错误，超出最大内存 with h5py.File(hdf5_path, 'r') as hf: x = hf.get('x')
x = np.array(x)

但下一步计算标准化参数时需要一次性输入x中的数据 x2d = x.reshape((n_segs * n_concat, n_freq)) scaler = preprocessing.StandardScaler(with_mean=True, with_std=True).fit(x2d)

如果希望处理大量数据，该如何得到scaler呢？

③inference函数里 if scale: mixed_x = pp_data.scale_on_2d(mixed_x, scaler) speech_x = pp_data.scale_on_2d(speech_x, scaler) 其中scaler是通过训练集的带噪语音计算出来的，为什么【测试集的标准化】要用【从训练集中计算得到的scaler】来计算呢？

期待您的回复与指导。谢谢！

qiuqiangkong commented 6 years ago

您好！在这种情况下，可以尝试：

减小数据量（最容易尝试）
批量读取
购买一块大一些的内存

Best wishes,

Qiuqiang

From: ZZZJane notifications@github.com Sent: 29 June 2018 09:15:18 To: yongxuUSTC/sednn Cc: Subscribed Subject: [yongxuUSTC/sednn] memory error报错 (#14)

您好，程序在小数据量下运行可行。但当增大数据量后，出现了内存报错。在prepare_data.py的compute_scaler函数中，data.h5文件中读取出来的数据存入np.array时报内存错误，超出最大内存 with h5py.File(hdf5_path, 'r') as hf: x = hf.get('x') x = np.array(x)

但下一步计算标准化参数时需要一次性输入x中的数据 x2d = x.reshape((n_segs * n_concat, n_freq)) scaler = preprocessing.StandardScaler(with_mean=True, with_std=True).fit(x2d)

如果希望处理大量数据，该如何得到scaler呢？期待您的指导。

― You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/14, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5ySn4ZkQQuDnuRp8-ampQZgRFteaZks5uBeIWgaJpZM4U8nth.

ZZZJane commented 6 years ago

非常感谢您的回复，我会尝试一下的。目前数据恢复的结果还是不太好，增大数据量也没有改善，甚至loss还会随着训练数据的增加而增大。参数都是使用的原程序里的数据。预训练使用clean2clean里提供的训练好的网络。

训练集：4620条语音4种噪声0db=18480 测试集：1680条语音（4+2种噪声）0db = 10080

训练集中随机选择2000条带噪语音进行3000次训练，训练loss 0.673188 测试集中随机选择1000条带噪语音进行测试，测试loss 0.975837

训练集中随机选择16000条带噪语音进行3000次训练，训练loss 0.715047 测试集中随机选择8000条带噪语音进行测试，测试loss 0.93619

如上所示，随着训练集增大，训练loss增大，测试loss减小，但数值仍很大，而且变化并不大，PESQ均小于2，请问您有什么建议吗？非常感谢！期待您的回复

在 2018-07-05 01:33:27，"qiuqiangkong" notifications@github.com 写道：您好！在这种情况下，可以尝试：

减小数据量（最容易尝试）
批量读取
购买一块大一些的内存

Best wishes,

Qiuqiang

From: ZZZJane notifications@github.com Sent: 29 June 2018 09:15:18 To: yongxuUSTC/sednn Cc: Subscribed Subject: [yongxuUSTC/sednn] memory error报错 (#14)

您好，程序在小数据量下运行可行。但当增大数据量后，出现了内存报错。在prepare_data.py的compute_scaler函数中，data.h5文件中读取出来的数据存入np.array时报内存错误，超出最大内存 with h5py.File(hdf5_path, 'r') as hf: x = hf.get('x') x = np.array(x)

但下一步计算标准化参数时需要一次性输入x中的数据 x2d = x.reshape((n_segs * n_concat, n_freq)) scaler = preprocessing.StandardScaler(with_mean=True, with_std=True).fit(x2d)

如果希望处理大量数据，该如何得到scaler呢？期待您的指导。

― You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/14, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5ySn4ZkQQuDnuRp8-ampQZgRFteaZks5uBeIWgaJpZM4U8nth.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

qiuqiangkong commented 6 years ago

您好，

clean2clean是为了验证从干净到干净的代码。

如果您想验证从加噪到干净，可以尝试运行mixture2clean_dnn的代码。在0 dB情况下，PESQ会在2以上。

 另外PESQ工具有个bug, PESQ的路径不能太长，否则PESQ值会很低。

Best wishes,

Qiuqiang

From: ZZZJane notifications@github.com Sent: 07 July 2018 09:34:53 To: yongxuUSTC/sednn Cc: Kong Q Mr (PG/R - Elec Electronic Eng); Comment Subject: Re: [yongxuUSTC/sednn] memory error报错 (#14)

非常感谢您的回复，我会尝试一下的。目前数据恢复的结果还是不太好，增大数据量也没有改善，甚至loss还会随着训练数据的增加而增大。参数都是使用的原程序里的数据。预训练使用clean2clean里提供的训练好的网络。

训练集：4620条语音4种噪声0db=18480 测试集：1680条语音（4+2种噪声）0db = 10080

训练集中随机选择2000条带噪语音进行3000次训练，训练loss 0.673188 测试集中随机选择1000条带噪语音进行测试，测试loss 0.975837

训练集中随机选择16000条带噪语音进行3000次训练，训练loss 0.715047 测试集中随机选择8000条带噪语音进行测试，测试loss 0.93619

如上所示，随着训练集增大，训练loss增大，测试loss减小，但数值仍很大，而且变化并不大，PESQ均小于2，请问您有什么建议吗？非常感谢！期待您的回复

在 2018-07-05 01:33:27，"qiuqiangkong" notifications@github.com 写道：您好！在这种情况下，可以尝试：

减小数据量（最容易尝试）
批量读取
购买一块大一些的内存

Best wishes,

Qiuqiang

From: ZZZJane notifications@github.com Sent: 29 June 2018 09:15:18 To: yongxuUSTC/sednn Cc: Subscribed Subject: [yongxuUSTC/sednn] memory error报错 (#14)

您好，程序在小数据量下运行可行。但当增大数据量后，出现了内存报错。在prepare_data.py的compute_scaler函数中，data.h5文件中读取出来的数据存入np.array时报内存错误，超出最大内存 with h5py.File(hdf5_path, 'r') as hf: x = hf.get('x') x = np.array(x)

但下一步计算标准化参数时需要一次性输入x中的数据 x2d = x.reshape((n_segs * n_concat, n_freq)) scaler = preprocessing.StandardScaler(with_mean=True, with_std=True).fit(x2d)

如果希望处理大量数据，该如何得到scaler呢？期待您的指导。

D You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/14, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5ySn4ZkQQuDnuRp8-ampQZgRFteaZks5uBeIWgaJpZM4U8nth.

― You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

― You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/14#issuecomment-403199015, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yeFRrJArDA1e6jjMuIuP38e_iHn_ks5uEHKtgaJpZM4U8nth.

ZZZJane commented 6 years ago

非常感谢您百忙之中的回复，我会再调整实验方案试一试。祝生活愉快，工作顺心！

在 2018-07-07 18:48:37，"qiuqiangkong" notifications@github.com 写道：您好，

clean2clean是为了验证从干净到干净的代码。

如果您想验证从加噪到干净，可以尝试运行mixture2clean_dnn的代码。在0 dB情况下，PESQ会在2以上。

另外PESQ工具有个bug, PESQ的路径不能太长，否则PESQ值会很低。

Best wishes,

Qiuqiang

From: ZZZJane notifications@github.com Sent: 07 July 2018 09:34:53 To: yongxuUSTC/sednn Cc: Kong Q Mr (PG/R - Elec Electronic Eng); Comment Subject: Re: [yongxuUSTC/sednn] memory error报错 (#14)

非常感谢您的回复，我会尝试一下的。目前数据恢复的结果还是不太好，增大数据量也没有改善，甚至loss还会随着训练数据的增加而增大。参数都是使用的原程序里的数据。预训练使用clean2clean里提供的训练好的网络。

训练集：4620条语音4种噪声0db=18480 测试集：1680条语音（4+2种噪声）0db = 10080

训练集中随机选择2000条带噪语音进行3000次训练，训练loss 0.673188 测试集中随机选择1000条带噪语音进行测试，测试loss 0.975837

训练集中随机选择16000条带噪语音进行3000次训练，训练loss 0.715047 测试集中随机选择8000条带噪语音进行测试，测试loss 0.93619

如上所示，随着训练集增大，训练loss增大，测试loss减小，但数值仍很大，而且变化并不大，PESQ均小于2，请问您有什么建议吗？非常感谢！期待您的回复

在 2018-07-05 01:33:27，"qiuqiangkong" notifications@github.com 写道：您好！在这种情况下，可以尝试：

减小数据量（最容易尝试）
批量读取
购买一块大一些的内存

Best wishes,

Qiuqiang

From: ZZZJane notifications@github.com Sent: 29 June 2018 09:15:18 To: yongxuUSTC/sednn Cc: Subscribed Subject: [yongxuUSTC/sednn] memory error报错 (#14)

您好，程序在小数据量下运行可行。但当增大数据量后，出现了内存报错。在prepare_data.py的compute_scaler函数中，data.h5文件中读取出来的数据存入np.array时报内存错误，超出最大内存 with h5py.File(hdf5_path, 'r') as hf: x = hf.get('x') x = np.array(x)

但下一步计算标准化参数时需要一次性输入x中的数据 x2d = x.reshape((n_segs * n_concat, n_freq)) scaler = preprocessing.StandardScaler(with_mean=True, with_std=True).fit(x2d)

如果希望处理大量数据，该如何得到scaler呢？期待您的指导。

D You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/14, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5ySn4ZkQQuDnuRp8-ampQZgRFteaZks5uBeIWgaJpZM4U8nth.

― You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

― You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/14#issuecomment-403199015, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yeFRrJArDA1e6jjMuIuP38e_iHn_ks5uEHKtgaJpZM4U8nth.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

yongxuUSTC / sednn

memory error报错 #14