modelscope / 3D-Speaker

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
Apache License 2.0
1.02k stars 89 forks source link

在运行speaker diarization中的run_audio.sh时发生错误 #74

Closed Coconut059 closed 5 months ago

Coconut059 commented 5 months ago

你好,在运行程序时有以下报错,好像是安装包版本的问题 run_audio.sh Stage 1: Prepare input wavs... --2024-03-04 12:49:43-- https://modelscope.cn/api/v1/models/damo/speech_campplus_speaker-diarization_common/repo?Revision=master&FilePath=examples/2speakers_example.wav Resolving modelscope.cn (modelscope.cn)... 39.101.130.40 Connecting to modelscope.cn (modelscope.cn)|39.101.130.40|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 2528044 (2.4M) [application/octet-stream] Saving to: 'examples/2speakers_example.wav'

 0K .......... .......... .......... .......... ..........  2%  418K 6s
50K .......... .......... .......... .......... ..........  4% 1.37M 4s

100K .......... .......... .......... .......... .......... 6% 812K 3s 150K .......... .......... .......... .......... .......... 8% 58.2K 12s 200K .......... .......... .......... .......... .......... 10% 10.2M 10s 250K .......... .......... .......... .......... .......... 12% 18.6M 8s 300K .......... .......... .......... .......... .......... 14% 18.3M 7s 350K .......... .......... .......... .......... .......... 16% 587K 6s 400K .......... .......... .......... .......... .......... 18% 942K 5s 450K .......... .......... .......... .......... .......... 20% 912K 5s 500K .......... .......... .......... .......... .......... 22% 939K 5s 550K .......... .......... .......... .......... .......... 24% 569K 4s 600K .......... .......... .......... .......... .......... 26% 971K 4s 650K .......... .......... .......... .......... .......... 28% 1015K 4s 700K .......... .......... .......... .......... .......... 30% 942K 4s 750K .......... .......... .......... .......... .......... 32% 1.02M 3s 800K .......... .......... .......... .......... .......... 34% 896K 3s 850K .......... .......... .......... .......... .......... 36% 958K 3s 900K .......... .......... .......... .......... .......... 38% 1024K 3s 950K .......... .......... .......... .......... .......... 40% 990K 3s 1000K .......... .......... .......... .......... .......... 42% 1021K 3s 1050K .......... .......... .......... .......... .......... 44% 973K 2s 1100K .......... .......... .......... .......... .......... 46% 468K 2s 1150K .......... .......... .......... .......... .......... 48% 12.7M 2s 1200K .......... .......... .......... .......... .......... 50% 1.03M 2s 1250K .......... .......... .......... .......... .......... 52% 551K 2s 1300K .......... .......... .......... .......... .......... 54% 837K 2s 1350K .......... .......... .......... .......... .......... 56% 991K 2s 1400K .......... .......... .......... .......... .......... 58% 570K 2s 1450K .......... .......... .......... .......... .......... 60% 747K 2s 1500K .......... .......... .......... .......... .......... 62% 949K 1s 1550K .......... .......... .......... .......... .......... 64% 970K 1s 1600K .......... .......... .......... .......... .......... 66% 929K 1s 1650K .......... .......... .......... .......... .......... 68% 711K 1s 1700K .......... .......... .......... .......... .......... 70% 657K 1s 1750K .......... .......... .......... .......... .......... 72% 959K 1s 1800K .......... .......... .......... .......... .......... 74% 972K 1s 1850K .......... .......... .......... .......... .......... 76% 1009K 1s 1900K .......... .......... .......... .......... .......... 78% 930K 1s 1950K .......... .......... .......... .......... .......... 81% 986K 1s 2000K .......... .......... .......... .......... .......... 83% 936K 1s 2050K .......... .......... .......... .......... .......... 85% 1.00M 1s 2100K .......... .......... .......... .......... .......... 87% 769K 0s 2150K .......... .......... .......... .......... .......... 89% 1.29M 0s 2200K .......... .......... .......... .......... .......... 91% 870K 0s 2250K .......... .......... .......... .......... .......... 93% 1007K 0s 2300K .......... .......... .......... .......... .......... 95% 993K 0s 2350K .......... .......... .......... .......... .......... 97% 992K 0s 2400K .......... .......... .......... .......... .......... 99% 954K 0s 2450K .......... ........ 100% 1.26M=3.5s

2024-03-04 12:49:47 (708 KB/s) - 'examples/2speakers_example.wav' saved [2528044/2528044]

--2024-03-04 12:49:47-- https://modelscope.cn/api/v1/models/damo/speech_eres2net-large_speaker-diarization_common/repo?Revision=master&FilePath=examples/2speakers_example.rttm Resolving modelscope.cn (modelscope.cn)... 39.101.130.40 Connecting to modelscope.cn (modelscope.cn)|39.101.130.40|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 380 [application/octet-stream] Saving to: 'examples/2speakers_example.rttm'

 0K                                                       100% 1.19M=0s

2024-03-04 12:49:48 (1.19 MB/s) - 'examples/2speakers_example.rttm' saved [380/380]

run_audio.sh Stage2: Do vad for input wavs... D:\Anaconda3\lib\site-packages\numpy_distributor_init.py:30: UserWarning: loaded more than 1 DLL from .libs: D:\Anaconda3\lib\site-packages\numpy.libs\libopenblas.FB5AE2TYXYH2IJRDKGDGQ3XBKLKTF43H.gfortran-win_amd64.dll D:\Anaconda3\lib\site-packages\numpy.libs\libopenblas64__v0.3.21-gcc_10_30.dll warnings.warn("loaded more than 1 DLL from .libs:" 2024-03-04 12:49:50,197 - modelscope - INFO - PyTorch version 2.2.1 Found. 2024-03-04 12:49:50,199 - modelscope - INFO - Loading ast index from C:\Users\Coconuttt.cache\modelscope\ast_indexer 2024-03-04 12:49:50,356 - modelscope - INFO - Loading done! Current index file version is 1.10.0, with md5 6d2959ed63b0f2682e848d2d1a7b8118 and a total number of 946 components indexed 2024-03-04 12:49:53,171 - modelscope - INFO - Use user-specified model revision: v2.0.4 2024-03-04 12:49:53,553 - modelscope - WARNING - ('PIPELINES', 'voice-activity-detection', 'funasr-pipeline') not found in ast index file Traceback (most recent call last): File "local/voice_activity_detection.py", line 93, in main() File "local/voice_activity_detection.py", line 59, in main vad_pipeline = pipeline( File "D:\Anaconda3\lib\site-packages\modelscope\pipelines\builder.py", line 170, in pipeline return build_pipeline(cfg, task_name=task) File "D:\Anaconda3\lib\site-packages\modelscope\pipelines\builder.py", line 65, in build_pipeline return build_from_cfg( File "D:\Anaconda3\lib\site-packages\modelscope\utils\registry.py", line 198, in build_from_cfg raise KeyError( KeyError: 'funasr-pipeline is not in the pipelines registry group voice-activity-detection. Please make sure the correct version of ModelScope library is used.'

我尝试了多种版本如:①funasr==0.8.4 modelscope==1.10.0 ② funasr==0.8.7 modelscope==1.10.0 ③funasr==0.8.8 modelscope==1.10.0 还是不能解决问题

wanghuii1 commented 5 months ago

funasr 1.x.x has been released. Please update it to latest version.

Coconut059 commented 5 months ago

I tried to update the funasr to latest version,the result is the same.(modelscope==1.11.0) KeyError: 'funasr-pipeline is not in the pipelines registry group voice-activity-detection. Please make sure the correct version of ModelScope library is used.

Then I update the modelscope to the latest version 1.12.0,a new error occured.

2024-03-04 14:25:07,616 - modelscope - INFO - initiate model from C:\Users\Coconuttt_.cache\modelscope\hub\iic\speech_fsmn_vadzh-cn-16k-common-pytorch 2024-03-04 14:25:07,616 - modelscope - INFO - initiate model from location C:\Users\Coconuttt.cache\modelscope\hub\iic\speech_fsmn_vadzh-cn-16k-common-pytorch. 2024-03-04 14:25:07,619 - modelscope - INFO - initialize model from C:\Users\Coconuttt.cache\modelscope\hub\iic\speech_fsmn_vad_zh-cn-16k-common-pytorch Traceback (most recent call last): File "D:\Anaconda3\lib\site-packages\modelscope\utils\registry.py", line 212, in build_from_cfg return obj_cls(args) File "D:\Anaconda3\lib\site-packages\modelscope\pipelines\audio\funasr_pipeline.py", line 62, in init super().init(model=model, kwargs) File "D:\Anaconda3\lib\site-packages\modelscope\pipelines\base.py", line 100, in init self.model = self.initiate_single_model(model, **kwargs) File "D:\Anaconda3\lib\site-packages\modelscope\pipelines\base.py", line 53, in initiate_single_model return Model.from_pretrained( File "D:\Anaconda3\lib\site-packages\modelscope\models\base\base_model.py", line 183, in from_pretrained model = build_model(model_cfg, task_name=task_name) File "D:\Anaconda3\lib\site-packages\modelscope\models\builder.py", line 35, in build_model model = build_from_cfg( File "D:\Anaconda3\lib\site-packages\modelscope\utils\registry.py", line 184, in build_from_cfg LazyImportModule.import_module(sig) File "D:\Anaconda3\lib\site-packages\modelscope\utils\import_utils.py", line 475, in import_module importlib.import_module(module_name) File "D:\Anaconda3\lib\importlib__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1014, in _gcd_import File "", line 991, in _find_and_load File "", line 975, in _find_and_load_unlocked File "", line 671, in _load_unlocked File "", line 843, in exec_module File "", line 219, in _call_with_frames_removed File "D:\Anaconda3\lib\site-packages\modelscope\models\audio\funasr\model.py", line 7, in from funasr import AutoModel ImportError: cannot import name 'AutoModel' from 'funasr' (unknown location)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "local/voice_activity_detection.py", line 93, in main() File "local/voice_activity_detection.py", line 59, in main vad_pipeline = pipeline( File "D:\Anaconda3\lib\site-packages\modelscope\pipelines\builder.py", line 170, in pipeline return build_pipeline(cfg, task_name=task) File "D:\Anaconda3\lib\site-packages\modelscope\pipelines\builder.py", line 65, in build_pipeline return build_from_cfg( File "D:\Anaconda3\lib\site-packages\modelscope\utils\registry.py", line 215, in build_from_cfg raise type(e)(f'{obj_cls.name}: {e}') ImportError: FunASRPipeline: cannot import name 'AutoModel' from 'funasr' (unknown location)

wanghuii1 commented 5 months ago

You can type "pip install --upgrade funasr". @Coconut059

Coconut059 commented 5 months ago

Still the same error.(upgrade funasr and use modelscope==1.10.0or1.11.0) KeyError: 'funasr-pipeline is not in the pipelines registry group voice-activity-detection. Please make sure the correct version of ModelScope library is used.' When I update the modlescope to 1.12.0.The error would turn to: ImportError: FunASRPipeline: cannot import name 'AutoModel' from 'funasr' (unknown location). I really don't know what to do..

wanghuii1 commented 5 months ago

What is your funasr version。Please ensure it is latest(1.0.11).

Coconut059 commented 5 months ago

I've updated to the latest version but it's still the same error, so I'm running on jupyter instead, and I'm getting another error. Computing DER... 2024-03-04 20:20:54,429 - INFO: Concatenating individual RTTM files... Traceback (most recent call last): File "/mnt/workspace/3D-Speaker-main/egs/3dspeaker/speaker-diarization/local/compute_der.py", line 72, in main(args) File "/mnt/workspace/3D-Speaker-main/egs/3dspeaker/speaker-diarization/local/computeder.py", line 47, in main [MS, FA, SER, DER] = DER( File "/mnt/workspace/3D-Speaker-main/egs/3dspeaker/speaker-diarization/local/DER.py", line 103, in DER stdout = subprocess.check_output(cmd, stderr=subprocess.STDOUT) File "/opt/conda/lib/python3.10/subprocess.py", line 421, in check_output return run(popenargs, stdout=PIPE, timeout=timeout, check=True, File "/opt/conda/lib/python3.10/subprocess.py", line 503, in run with Popen(popenargs, **kwargs) as process: File "/opt/conda/lib/python3.10/subprocess.py", line 971, in init self._execute_child(args, executable, preexec_fn, close_fds, File "/opt/conda/lib/python3.10/subprocess.py", line 1863, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) PermissionError: [Errno 13] Permission denied: '/mnt/workspace/3D-Speaker-main/egs/3dspeaker/speaker-diarization/local/md-eval.pl'

wanghuii1 commented 5 months ago

I've updated the "egs/3dspeaker/speaker-diarization/local/DER.py". You can solve it by pull the new one.

Coconut059 commented 5 months ago

Thanks a lot!!!It finally worked. I want to know how to use my own data set to run on this model(speaker diarization).

Coconut059 commented 5 months ago

Thanks a lot!!!It finally worked. I want to know how to use my own data set to run on this model(speaker diarization).

wanghuii1 commented 5 months ago

The diarization pipeline is based on a pretrained VAD model and speaker embedding model. The pretrained VAD model used is the "iic\speech_fsmn_vad_zh-cn-16k-common-pytorch" in ModelScope. You can train your own speaker embedding model using any speaker verification recipe in this repo. You can also use other pretrained speaker models in ModelScope by changing the value of "speaker_model_id" in run_audio.sh