yeyupiaoling / MASR

Pytorch实现的流式与非流式的自动语音识别框架,同时兼容在线和离线识别,目前支持Conformer、Squeezeformer、DeepSpeech2模型,支持多种数据增强方法。
Apache License 2.0
563 stars 100 forks source link

选择音频处理方式前向计算维度错误 #49

Closed jackjieliu closed 1 year ago

jackjieliu commented 1 year ago

夜雨你好,我在跑这个项目时遇到如下错误: 问题描述: 我在选择使用mfcc处理音频时,错误如下: Traceback (most recent call last): File "infer_path.py", line 35, in predictor = Predictor(model_path=args.model_path, vocab_path=args.vocab_path, use_model=args.use_model, File "C:\Users\Administrator\PycharmProjects\masr\MASR\masr\predict.py", line 101, in init self.predict(warmup_audio_path, to_an=False) File "C:\Users\Administrator\PycharmProjects\masr\MASR\masr\predict.py", line 173, in predict outputdata, , _ = self.predictor(audio_data, audio_len, init_state_h_box, init_state_c_box) File "C:\Users\Administrator\PycharmProjects\masr\venv\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "C:\Users\Administrator\PycharmProjects\masr\MASR\masr\model_utils\utils.py", line 33, in forward x = self.normalizer(audio) File "C:\Users\Administrator\PycharmProjects\masr\venv\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(input, **kwargs) File "C:\Users\Administrator\PycharmProjects\masr\MASR\masr\model_utils\utils.py", line 19, in forward x = (x - self.mean) / (self.std + self.eps) RuntimeError: The size of tensor a (39) must match the size of tensor b (161) at non-singleton dimension 1

在选择使用fbank处理音频时,错误如下: Traceback (most recent call last): File "infer_path.py", line 35, in predictor = Predictor(model_path=args.model_path, vocab_path=args.vocab_path, use_model=args.use_model, File "C:\Users\Administrator\PycharmProjects\masr\MASR\masr\predict.py", line 101, in init self.predict(warmup_audio_path, to_an=False) File "C:\Users\Administrator\PycharmProjects\masr\MASR\masr\predict.py", line 173, in predict outputdata, , _ = self.predictor(audio_data, audio_len, init_state_h_box, init_state_c_box) File "C:\Users\Administrator\PycharmProjects\masr\venv\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "C:\Users\Administrator\PycharmProjects\masr\MASR\masr\model_utils\utils.py", line 33, in forward x = self.normalizer(audio) File "C:\Users\Administrator\PycharmProjects\masr\venv\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(input, **kwargs) File "C:\Users\Administrator\PycharmProjects\masr\MASR\masr\model_utils\utils.py", line 19, in forward x = (x - self.mean) / (self.std + self.eps) RuntimeError: The size of tensor a (120) must match the size of tensor b (161) at non-singleton dimension 1

这个要怎么解决呢 还有我想问问,使用mfcc或者fbank的效果一定会比线性的好吗 希望您能解惑

yeyupiaoling commented 1 year ago

你这个是下载了模型,然后使用了其他的预处理方法吧。 不同的预处理方法,输出的大小不一样,所以你要注意。 根据我之前的实验结果来看的话,其实准确率差不多。不过线性预处理的确是高一点点。