modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
6.45k stars 687 forks source link

Support Whisper-v3-large-turbo #2132

Open MonolithFoundation opened 6 days ago

MonolithFoundation commented 6 days ago

Support Whisper-v3-large-turbo

LauraGPT commented 5 days ago

Please update funasr-1.1.12:

https://github.com/modelscope/FunASR/tree/main/examples/industrial_data_pretraining/whisper

MonolithFoundation commented 5 days ago

thanks for the quick response!

MonolithFoundation commented 5 days ago

s exceeded with url: /funasr/ (Caused by SSLError(SSLError(1, '[SSL] record layer failure (_ssl.c:1006)'))) - skipping ERROR: Could not find a version that satisfies the requirement funasr==1.1.12 (from versions: 0.3.1, 0.4.1, 0.4.2, 0.4.3, 0.4.4, 0.4.6, 0.4.7, 0.4.8, 0.5.0, 0.5.1, 0.5.2, 0.5.3, 0.5.4, 0.5.5, 0.5.6, 0.5.8, 0.6.0, 0.6.1, 0.6.2, 0.6.3, 0.6.4, 0.6.5, 0.6.6, 0.6.7, 0.6.9, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.7.4, 0.7.5, 0.7.6, 0.7.7, 0.7.8, 0.7.9, 0.8.0, 0.8.1, 0.8.2, 0.8.3, 0.8.4, 0.8.6, 0.8.7, 0.8.8, 1.0.0, 1.0.2, 1.0.3, 1.0.4, 1.0.5, 1.0.6, 1.0.7, 1.0.8, 1.0.9, 1.0.10, 1.0.11, 1.0.12, 1.0.14, 1.0.15, 1.0.16, 1.0.17, 1.0.18, 1.0.19, 1.0.20, 1.0.21, 1.0.22, 1.0.23, 1.0.24, 1.0.25, 1.0.26, 1.0.27, 1.0.28, 1.0.29, 1.0.30, 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 1.1.5, 1.1.6, 1.1.8, 1.1.9, 1.1.11) ERROR: No matching distribution found for funasr==1.1.12

MonolithFoundation commented 5 days ago
image
MonolithFoundation commented 5 days ago

after Installed from git:

LauraGPT commented 5 days ago

Please update funasr again and re-try it: https://github.com/modelscope/FunASR/commit/cd684580991661b9a088361bea2d7f00735178d3

slin000111 commented 5 days ago

after Installed from git:

  • Authentication token does not exist, failed to access model Whisper-large-v3-turbo which may not exist or may be private. Please login first.
modelscope login --token YOUR_MODELSCOPE_SDK_TOKEN

You can get the SDK token on Home page, https://modelscope.cn/my/myaccesstoken.

MonolithFoundation commented 5 days ago

Hi, how to deal with this error anyway:

File "/tests/test_speakersep.py", line 97, in get_asr_spk res = self.model.generate( ^^^^^^^^^^^^^^^^^^^^ File "/FunASR/funasr/auto/auto_model.py", line 303, in generate return self.inference_with_vad(input, input_len=input_len, **cfg) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/FunASR/funasr/auto/auto_model.py", line 553, in inference_with_vad sv_output = postprocess(all_segments, None, labels, spk_embedding.cpu()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "FunASR/funasr/models/campplus/utils.py", line 117, in postprocess assert len(segments) == len(labels) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError 0%|

MonolithFoundation commented 5 days ago

Hello, anyone would like help this out? Currently the WhipserTurbo is not stable at alll

LauraGPT commented 5 days ago

just use it follow demos, any other usages are not supported now: https://github.com/modelscope/FunASR/tree/main/examples/industrial_data_pretraining/whisper

MonolithFoundation commented 5 days ago

The labels and segments not equal should because of this? vad_kwargs={"max_single_segment_time": 30000},

MonolithFoundation commented 5 days ago

Why doesn't support speaker for whisper?

TurboMa commented 5 days ago

Hi, how to deal with this error anyway:

File "/tests/test_speakersep.py", line 97, in get_asr_spk res = self.model.generate( ^^^^^^^^^^^^^^^^^^^^ File "/FunASR/funasr/auto/auto_model.py", line 303, in generate return self.inference_with_vad(input, input_len=input_len, **cfg) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/FunASR/funasr/auto/auto_model.py", line 553, in inference_with_vad sv_output = postprocess(all_segments, None, labels, spk_embedding.cpu()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "FunASR/funasr/models/campplus/utils.py", line 117, in postprocess assert len(segments) == len(labels) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError 0%|

same for me

LauraGPT commented 5 days ago

Why doesn't support speaker for whisper?

Whisper models lack timestamps for speaker recognition.

TurboMa commented 5 days ago

Why doesn't support speaker for whisper?

Whisper models lack timestamps for speaker recognition.

截屏2024-10-12 14 12 36

the latest turbo version could be made to predict timestamps according to their model card from huggingface.co.

LauraGPT commented 4 days ago

Why doesn't support speaker for whisper?

Whisper models lack timestamps for speaker recognition.

截屏2024-10-12 14 12 36

the latest turbo version could be made to predict timestamps according to their model card from huggingface.co.

The timesptamp of whisper is sentence-level. However, the timestamp of speaker recognition should be word-level. If you are interest in that, maybe you could do it by yourself.

MonolithFoundation commented 4 days ago

Hi, if we using vad model first?

TurboMa commented 4 days ago

Why doesn't support speaker for whisper?

Whisper models lack timestamps for speaker recognition.

截屏2024-10-12 14 12 36

the latest turbo version could be made to predict timestamps according to their model card from huggingface.co.

The timesptamp of whisper is sentence-level. However, the timestamp of speaker recognition should be word-level. If you are interest in that, maybe you could do it by yourself.

thanks, very impressive

MonolithFoundation commented 4 days ago

Hi, still can not understand, why speaker recognition must be word level?