A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Notice: ffmpeg is not installed. torchaudio is used to load audio
If you want to use ffmpeg backend to load audio, please install it by:
sudo apt install ffmpeg # ubuntu
brew install ffmpeg # mac
Key Conformer already exists in model_classes, re-register
Key Linear already exists in adaptor_classes, re-register
Key TransformerDecoder already exists in decoder_classes, re-register
Key LightweightConvolutionTransformerDecoder already exists in decoder_classes, re-register
Key LightweightConvolution2DTransformerDecoder already exists in decoder_classes, re-register
Key DynamicConvolutionTransformerDecoder already exists in decoder_classes, re-register
Key DynamicConvolution2DTransformerDecoder already exists in decoder_classes, re-register
funasr version: 1.1.14.
Check update of funasr, and it would cost few times. You may disable it by set disable_update=True in AutoModel
You are using the latest version of funasr-1.1.14
2024-11-18 15:23:47,114 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
2024-11-18 15:23:50,544 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
0%| | 0/1 [00:00<?, ?it/s]Traceback (most recent call last):
File "D:\Python\lib\site-packages\funasr\utils\load_utils.py", line 93, in load_audio_text_image_video
data_or_path_or_list, audio_fs = torchaudio.load(data_or_path_or_list)
File "D:\Python\lib\site-packages\torchaudio_backend\utils.py", line 203, in load
return backend.load(uri, frame_offset, num_frames, normalize, channels_first, format, buffer_size)
File "D:\Python\lib\site-packages\torchaudio_backend\soundfile.py", line 26, in load
return soundfile_backend.load(uri, frame_offset, num_frames, normalize, channels_first, format)
File "D:\Python\lib\site-packages\torchaudio_backend\soundfilebackend.py", line 221, in load
with soundfile.SoundFile(filepath, "r") as file:
File "D:\Python\lib\site-packages\soundfile.py", line 658, in init
self._file = self._open(file, mode_int, closefd)
File "D:\Python\lib\site-packages\soundfile.py", line 1216, in _open
raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name))
soundfile.LibsndfileError: Error opening 'D:\demo\demo_1\recording.wav': Format not recognised.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\demo\语音识别.py", line 60, in
res = model.generate(
File "D:\Python\lib\site-packages\funasr\auto\auto_model.py", line 304, in generate
return self.inference_with_vad(input, input_len=input_len, cfg)
File "D:\Python\lib\site-packages\funasr\auto\auto_model.py", line 377, in inference_with_vad
res = self.inference(
File "D:\Python\lib\site-packages\funasr\auto\auto_model.py", line 343, in inference
res = model.inference(batch, *kwargs)
File "D:\Python\lib\site-packages\funasr\models\fsmn_vad_streaming\model.py", line 676, in inference
audio_sample_list = load_audio_text_image_video(
File "D:\Python\lib\site-packages\funasr\utils\load_utils.py", line 72, in load_audio_text_image_video
return [
File "D:\Python\lib\site-packages\funasr\utils\load_utils.py", line 73, in
load_audio_text_image_video(
File "D:\Python\lib\site-packages\funasr\utils\load_utils.py", line 97, in load_audio_text_image_video
data_or_path_or_list = _load_audio_ffmpeg(data_or_path_or_list, sr=fs)
File "D:\Python\lib\site-packages\funasr\utils\load_utils.py", line 213, in _load_audio_ffmpeg
out = run(cmd, capture_output=True, check=True).stdout
File "D:\Python\lib\subprocess.py", line 501, in run
with Popen(popenargs, **kwargs) as process:
File "D:\Python\lib\subprocess.py", line 969, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "D:\Python\lib\subprocess.py", line 1438, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] 系统找不到指定的文件。
0%| | 0/1 [00:00<?, ?it/s]
这是报错
from funasr import AutoModel from funasr.utils.postprocess_utils import rich_transcription_postprocess
model_dir = "iic/SenseVoiceSmall"
model = AutoModel( model=model_dir, vad_model="fsmn-vad", vad_kwargs={"max_single_segment_time": 30000}, device="cuda:0", )
en
res = model.generate( input=f"D:\demo\demo_1\recording.wav", cache={}, language="auto", # "zn", "en", "yue", "ja", "ko", "nospeech" use_itn=True, batch_size_s=60, merge_vad=True, # merge_length_s=15, ) text = rich_transcription_postprocess(res[0]["text"]) print(text) 上述是运行代码
Notice: ffmpeg is not installed. torchaudio is used to load audio If you want to use ffmpeg backend to load audio, please install it by: sudo apt install ffmpeg # ubuntu
brew install ffmpeg # mac
Key Conformer already exists in model_classes, re-register Key Linear already exists in adaptor_classes, re-register Key TransformerDecoder already exists in decoder_classes, re-register Key LightweightConvolutionTransformerDecoder already exists in decoder_classes, re-register Key LightweightConvolution2DTransformerDecoder already exists in decoder_classes, re-register Key DynamicConvolutionTransformerDecoder already exists in decoder_classes, re-register Key DynamicConvolution2DTransformerDecoder already exists in decoder_classes, re-register funasr version: 1.1.14. Check update of funasr, and it would cost few times. You may disable it by set
disable_update=True
in AutoModel You are using the latest version of funasr-1.1.14 2024-11-18 15:23:47,114 - modelscope - WARNING - Using branch: master as version is unstable, use with caution 2024-11-18 15:23:50,544 - modelscope - WARNING - Using branch: master as version is unstable, use with caution 0%| | 0/1 [00:00<?, ?it/s]Traceback (most recent call last): File "D:\Python\lib\site-packages\funasr\utils\load_utils.py", line 93, in load_audio_text_image_video data_or_path_or_list, audio_fs = torchaudio.load(data_or_path_or_list) File "D:\Python\lib\site-packages\torchaudio_backend\utils.py", line 203, in load return backend.load(uri, frame_offset, num_frames, normalize, channels_first, format, buffer_size) File "D:\Python\lib\site-packages\torchaudio_backend\soundfile.py", line 26, in load return soundfile_backend.load(uri, frame_offset, num_frames, normalize, channels_first, format) File "D:\Python\lib\site-packages\torchaudio_backend\soundfilebackend.py", line 221, in load with soundfile.SoundFile(filepath, "r") as file: File "D:\Python\lib\site-packages\soundfile.py", line 658, in init self._file = self._open(file, mode_int, closefd) File "D:\Python\lib\site-packages\soundfile.py", line 1216, in _open raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name)) soundfile.LibsndfileError: Error opening 'D:\demo\demo_1\recording.wav': Format not recognised.During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "D:\demo\语音识别.py", line 60, in
res = model.generate(
File "D:\Python\lib\site-packages\funasr\auto\auto_model.py", line 304, in generate
return self.inference_with_vad(input, input_len=input_len, cfg)
File "D:\Python\lib\site-packages\funasr\auto\auto_model.py", line 377, in inference_with_vad
res = self.inference(
File "D:\Python\lib\site-packages\funasr\auto\auto_model.py", line 343, in inference
res = model.inference(batch, *kwargs)
File "D:\Python\lib\site-packages\funasr\models\fsmn_vad_streaming\model.py", line 676, in inference
audio_sample_list = load_audio_text_image_video(
File "D:\Python\lib\site-packages\funasr\utils\load_utils.py", line 72, in load_audio_text_image_video
return [
File "D:\Python\lib\site-packages\funasr\utils\load_utils.py", line 73, in
load_audio_text_image_video(
File "D:\Python\lib\site-packages\funasr\utils\load_utils.py", line 97, in load_audio_text_image_video
data_or_path_or_list = _load_audio_ffmpeg(data_or_path_or_list, sr=fs)
File "D:\Python\lib\site-packages\funasr\utils\load_utils.py", line 213, in _load_audio_ffmpeg
out = run(cmd, capture_output=True, check=True).stdout
File "D:\Python\lib\subprocess.py", line 501, in run
with Popen( popenargs, **kwargs) as process:
File "D:\Python\lib\subprocess.py", line 969, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "D:\Python\lib\subprocess.py", line 1438, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] 系统找不到指定的文件。
0%| | 0/1 [00:00<?, ?it/s]
这是报错