Support Whisper-v3-large-turbo

modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

https://www.funasr.com

Other

7.04k stars 752 forks source link

Support Whisper-v3-large-turbo #2132

Closed MonolithFoundation closed 2 weeks ago

MonolithFoundation commented 1 month ago

Support Whisper-v3-large-turbo

LauraGPT commented 1 month ago

Please update funasr-1.1.12:

https://github.com/modelscope/FunASR/tree/main/examples/industrial_data_pretraining/whisper

MonolithFoundation commented 1 month ago

thanks for the quick response!

MonolithFoundation commented 1 month ago

s exceeded with url: /funasr/ (Caused by SSLError(SSLError(1, '[SSL] record layer failure (_ssl.c:1006)'))) - skipping ERROR: Could not find a version that satisfies the requirement funasr==1.1.12 (from versions: 0.3.1, 0.4.1, 0.4.2, 0.4.3, 0.4.4, 0.4.6, 0.4.7, 0.4.8, 0.5.0, 0.5.1, 0.5.2, 0.5.3, 0.5.4, 0.5.5, 0.5.6, 0.5.8, 0.6.0, 0.6.1, 0.6.2, 0.6.3, 0.6.4, 0.6.5, 0.6.6, 0.6.7, 0.6.9, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.7.4, 0.7.5, 0.7.6, 0.7.7, 0.7.8, 0.7.9, 0.8.0, 0.8.1, 0.8.2, 0.8.3, 0.8.4, 0.8.6, 0.8.7, 0.8.8, 1.0.0, 1.0.2, 1.0.3, 1.0.4, 1.0.5, 1.0.6, 1.0.7, 1.0.8, 1.0.9, 1.0.10, 1.0.11, 1.0.12, 1.0.14, 1.0.15, 1.0.16, 1.0.17, 1.0.18, 1.0.19, 1.0.20, 1.0.21, 1.0.22, 1.0.23, 1.0.24, 1.0.25, 1.0.26, 1.0.27, 1.0.28, 1.0.29, 1.0.30, 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 1.1.5, 1.1.6, 1.1.8, 1.1.9, 1.1.11) ERROR: No matching distribution found for funasr==1.1.12

MonolithFoundation commented 1 month ago

after Installed from git:

Authentication token does not exist, failed to access model Whisper-large-v3-turbo which may not exist or may be private. Please login first.

LauraGPT commented 1 month ago

Please update funasr again and re-try it: https://github.com/modelscope/FunASR/commit/cd684580991661b9a088361bea2d7f00735178d3

slin000111 commented 1 month ago

after Installed from git:

Authentication token does not exist, failed to access model Whisper-large-v3-turbo which may not exist or may be private. Please login first.

modelscope login --token YOUR_MODELSCOPE_SDK_TOKEN

You can get the SDK token on Home page, https://modelscope.cn/my/myaccesstoken.

MonolithFoundation commented 1 month ago

Hi, how to deal with this error anyway:

File "/tests/test_speakersep.py", line 97, in get_asr_spk res = self.model.generate( ^^^^^^^^^^^^^^^^^^^^ File "/FunASR/funasr/auto/auto_model.py", line 303, in generate return self.inference_with_vad(input, input_len=input_len, **cfg) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/FunASR/funasr/auto/auto_model.py", line 553, in inference_with_vad sv_output = postprocess(all_segments, None, labels, spk_embedding.cpu()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "FunASR/funasr/models/campplus/utils.py", line 117, in postprocess assert len(segments) == len(labels) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError 0%|

MonolithFoundation commented 1 month ago

Hello, anyone would like help this out? Currently the WhipserTurbo is not stable at alll

LauraGPT commented 1 month ago

just use it follow demos, any other usages are not supported now: https://github.com/modelscope/FunASR/tree/main/examples/industrial_data_pretraining/whisper

MonolithFoundation commented 1 month ago

The labels and segments not equal should because of this? vad_kwargs={"max_single_segment_time": 30000},

MonolithFoundation commented 1 month ago

Why doesn't support speaker for whisper?

TurboMa commented 1 month ago

Hi, how to deal with this error anyway:

File "/tests/test_speakersep.py", line 97, in get_asr_spk res = self.model.generate( ^^^^^^^^^^^^^^^^^^^^ File "/FunASR/funasr/auto/auto_model.py", line 303, in generate return self.inference_with_vad(input, input_len=input_len, **cfg) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/FunASR/funasr/auto/auto_model.py", line 553, in inference_with_vad sv_output = postprocess(all_segments, None, labels, spk_embedding.cpu()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "FunASR/funasr/models/campplus/utils.py", line 117, in postprocess assert len(segments) == len(labels) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError 0%|

same for me

LauraGPT commented 1 month ago

Why doesn't support speaker for whisper?

Whisper models lack timestamps for speaker recognition.

TurboMa commented 1 month ago

Why doesn't support speaker for whisper?

Whisper models lack timestamps for speaker recognition.

the latest turbo version could be made to predict timestamps according to their model card from huggingface.co.

LauraGPT commented 1 month ago

Why doesn't support speaker for whisper?

Whisper models lack timestamps for speaker recognition.

the latest turbo version could be made to predict timestamps according to their model card from huggingface.co.

The timesptamp of whisper is sentence-level. However, the timestamp of speaker recognition should be word-level. If you are interest in that, maybe you could do it by yourself.

MonolithFoundation commented 1 month ago

Hi, if we using vad model first?

TurboMa commented 1 month ago

Why doesn't support speaker for whisper?

Whisper models lack timestamps for speaker recognition.

the latest turbo version could be made to predict timestamps according to their model card from huggingface.co.

The timesptamp of whisper is sentence-level. However, the timestamp of speaker recognition should be word-level. If you are interest in that, maybe you could do it by yourself.

thanks, very impressive

MonolithFoundation commented 1 month ago

Hi, still can not understand, why speaker recognition must be word level?