shibing624 / parrots

Automatic Speech Recognition(ASR), Text-To-Speech(TTS) engine. 中英语音识别、多角色语音合成,支持多语言,准确率高
Apache License 2.0
474 stars 88 forks source link

语音转文字识别率低 #21

Closed MentosL closed 1 year ago

MentosL commented 1 year ago

环境: Windows 10 专业版

问题: 安装环境之后,使用example中存在的例子和个人素材进行demo:

example : image

个人素材也是同样识别出第一个音,后面就没有了。

目的: 想请教大佬们,目前转化的准确率是存在问题,后面能进一步提高嘛?

minskiter commented 1 year ago

模型没有加载成功,使用macos也是出现了此情况,提示无法打开模型——DATA_LOSS。由于个人没有用过tensorflow,不知道其原因是否是因为默认使用的GPU,但是这里仅能只能CPU导致的呢?

minskiter commented 1 year ago

模型没有加载成功,使用macos也是出现了此情况,提示无法打开模型——DATA_LOSS。由于个人没有用过tensorflow,不知道其原因是否是因为默认使用的GPU,但是这里仅能只能CPU导致的呢?

2023-06-17 17:57:08.900101: W tensorflow/core/util/tensor_slice_reader.cc:97] Could not open /Users/minskiter/miniconda3/envs/voice/lib/python3.8/site-packages/parrots/data/speech_model/speech_recognition.model: DATA_LOSS: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
2023-06-17 17:57:08.919118: W tensorflow/core/util/tensor_slice_reader.cc:97] Could not open /Users/minskiter/miniconda3/envs/voice/lib/python3.8/site-packages/parrots/data/speech_model/speech_recognition.model.base: DATA_LOSS: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? 
AlucardNosferatu commented 1 year ago

把.model文件改成.h5,.base的那个改成.base.h5,就可以正常加载了(至少没报错),但是结果是一样的,只有最开始两个拼音识别成功

AlucardNosferatu commented 1 year ago

在作者提供的在线体验demo上效果也是一样的,至少和pip里用的是同一个模型( https://www.mulanai.com/product/asr/#trial

AlucardNosferatu commented 1 year ago

看predict里面把data塞进x_in里只占了很小一部分(349/1600),其余都是0,而且predict用的模型还是base_model,不是_model,不确定和这个是否相关