modelscope / 3D-Speaker

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
Apache License 2.0
1.07k stars 93 forks source link

Error occurred during "bash run.sh" for speaker diarization #58

Closed NathanJHLee closed 7 months ago

NathanJHLee commented 7 months ago

Hi My name is Nathan. And i try to test 3d-speaker to get rttm from pretrained model on model scope. But i get error as below.

(3D-Speaker) [asr@0419bb3cf325 speaker-diarization]$ bash run.sh Stage 1: Prepare input wavs... --2024-02-05 09:07:39-- https://modelscope.cn/api/v1/models/damo/speech_eres2net-large_speaker-diarization_common/repo?Revision=master&FilePath=examples/2speakers_example.wav Resolving modelscope.cn (modelscope.cn)... 39.101.130.40 Connecting to modelscope.cn (modelscope.cn)|39.101.130.40|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 2528044 (2.4M) [application/octet-stream] Saving to: 'examples/2speakers_example.wav'

100%[===========================================================================>] 2,528,044 831KB/s in 3.0s

2024-02-05 09:07:43 (831 KB/s) - 'examples/2speakers_example.wav' saved [2528044/2528044]

--2024-02-05 09:07:43-- https://modelscope.cn/api/v1/models/damo/speech_eres2net-large_speaker-diarization_common/repo?Revision=master&FilePath=examples/2speakers_example.rttm Resolving modelscope.cn (modelscope.cn)... 39.101.130.40 Connecting to modelscope.cn (modelscope.cn)|39.101.130.40|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 380 [application/octet-stream] Saving to: 'examples/2speakers_example.rttm'

100%[===========================================================================>] 380 --.-K/s in 0s

2024-02-05 09:07:44 (40.0 MB/s) - 'examples/2speakers_example.rttm' saved [380/380]

Stage2: Do vad for input wavs... 2024-02-05 09:07:46,885 - modelscope - INFO - PyTorch version 1.13.1 Found. 2024-02-05 09:07:46,886 - modelscope - INFO - Loading ast index from /home/asr/.cache/modelscope/ast_indexer 2024-02-05 09:07:47,056 - modelscope - INFO - Updating the files for the changes of local files, first time updating will take longer time! Please wait till updating done! 2024-02-05 09:07:47,083 - modelscope - INFO - AST-Scanning the path "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/modelscope" with the following sub folders ['models', 'metrics', 'pipelines', 'preprocessors', 'trainers', 'msdatasets', 'exporters'] 2024-02-05 09:08:18,037 - modelscope - INFO - Scanning done! A number of 964 components indexed or updated! Time consumed 30.954344987869263s 2024-02-05 09:08:18,114 - modelscope - INFO - Loading done! Current index file version is 1.12.0, with md5 ccb085697b83dbefd09232fac3402a63 and a total number of 964 components indexed Please install rotary_embedding_torch by: pip install -U rotary_embedding_torch Please install rotary_embedding_torch by: pip install -U rotary_embedding_torch Please Requires the ffmpeg CLI and ffmpeg-python package to be installed. Please install rotary_embedding_torch by: pip install -U rotary_embedding_torch Please install rotary_embedding_torch by: pip install -U rotary_embedding_torch 2024-02-05 09:08:22,477 - modelscope - WARNING - Model revision not specified, use revision: v2.0.4 2024-02-05 09:08:22,825 - modelscope - INFO - initiate model from /home/asr/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch 2024-02-05 09:08:22,826 - modelscope - INFO - initiate model from location /home/asr/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch. 2024-02-05 09:08:22,827 - modelscope - INFO - initialize model from /home/asr/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch 2024-02-05 09:08:22,874 - modelscope - WARNING - No preprocessor field found in cfg. 2024-02-05 09:08:22,875 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file. 2024-02-05 09:08:22,875 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': '/home/asr/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch'}. trying to build by task and model information. 2024-02-05 09:08:22,875 - modelscope - WARNING - No preprocessor key ('funasr', 'voice-activity-detection') found in PREPROCESSOR_MAP, skip building preprocessor. 2024-02-05 09:08:22,876 - modelscope - INFO - cuda is not available, using cpu instead. [INFO]: Start computing VAD... rtf_avg: 0.043: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.22it/s] Traceback (most recent call last): File "local/voice_activity_detection.py", line 90, in main() File "local/voice_activity_detection.py", line 71, in main for vad_t in vad_time['text']: TypeError: list indices must be integers or slices, not str

if i print "vad_time", I get check [{'key': 'rand_key_2yW4Acq9GFz6Y', 'value': [[5240, 29010], [29290, 37360], [37640, 67570], [67860, 78980]]}]

I don't understand meaning of text. Please check this problem. Thank you.

yfchenlucky commented 7 months ago

We revised the requirements for speaker diarization:

numba==0.56.2 umap-learn funasr==0.8.4 modelscope==1.10.0 hdbscan

And you can try it again. Please feel free to ask me.

yfchenlucky commented 7 months ago

Judging from the error message, it should be a problem with the torchaudio version. You can check whether the torchaudio version meets the requirements. We use the virtual environment of python3.8. You can pip install torchaudio==0.12.0. Have a try!

NathanJHLee commented 7 months ago

Oh thank you. The problem with Torchaudio was figured out, so I deleted my question yesterday XD. Anyway I encountered one more import error about transformer package. I suggest you to add pip install transformers to requirements.txt So finally it works fine. Thank you for your help :D

yfchenlucky commented 7 months ago

We don't need transformer in our dependencies, maybe you can try uninstall transformer. And if you find this repository useful, please consider giving a star.