yeyupiaoling / PPASR

基于PaddlePaddle实现端到端中文语音识别,从入门到实战,超简单的入门案例,超实用的企业项目。支持当前最流行的DeepSpeech2、Conformer、Squeezeformer模型
Apache License 2.0
816 stars 129 forks source link

生成数据列表时出现错误 #47

Closed Shehold closed 2 years ago

Shehold commented 2 years ago

老哥,我是在AI studio上部署的,在运行create_data时出现了这个问题,希望您能帮助解答一下,谢谢!

annotation_path: dataset/annotation/
count_threshold: 2
dataset_vocab: dataset/vocabulary.txt
feature_method: linear
is_change_frame_rate: True
max_test_manifest: 10000
mean_std_path: dataset/mean_std.npz
noise_manifest_path: dataset/manifest.noise
noise_path: dataset/audio/noise
num_samples: 1000000
num_workers: 8
test_manifest: dataset/manifest.test
train_manifest: dataset/manifest.train
------------------------------------------------
开始生成数据列表...
  0%|                                                                                                                                          | 0/7176 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "create_data.py", line 39, in <module>
    max_test_manifest=args.max_test_manifest)
  File "/home/aistudio/ppasr/trainer.py", line 110, in create_data
    max_test_manifest=max_test_manifest)
  File "/home/aistudio/ppasr/utils/utils.py", line 61, in create_manifest
    change_rate(audio_path)
  File "/home/aistudio/ppasr/utils/utils.py", line 105, in change_rate
    data, sr = soundfile.read(audio_path)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/soundfile.py", line 257, in read
    subtype, endian, format, closefd) as f:
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/soundfile.py", line 629, in __init__
    self._file = self._open(file, mode_int, closefd)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/soundfile.py", line 1184, in _open
    "Error opening {0!r}: ".format(self.name))
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/soundfile.py", line 1357, in _error_check
    raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening 'aset/audio/data_aishell/wav/test/S0764/BAC009S0764W0480.wav': System error.
yeyupiaoling commented 2 years ago

你看最后一行的报错,是路径少了最前面的三个字母,应该是dataset的。这是因为他本来是在下来一个是../裁剪的,应该是你又 改动。

Shehold commented 2 years ago

你看最后一行的报错,是路径少了最前面的三个字母,应该是dataset的。这是因为他本来是在下来一个是../裁剪的,应该是你又 改动。

是的,我之前是将"../"的部分已经删除了,如果要取消裁剪的话是在哪部分修改呢

yeyupiaoling commented 2 years ago

这里做了采集,你恢复就行

https://github.com/yeyupiaoling/PPASR/blob/d32dd2e8e883d79f86ab239e5c55039d5fe31ff5/download_data/thchs_30.py#L35