modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
6.15k stars 657 forks source link

为什么使用paraformer-large模型的长音频版运行finetune.sh脚本,还是无法识别20s以上的音频文件 #1843

Closed lllmd closed 3 months ago

lllmd commented 3 months ago

Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

❓ Questions and Help

Before asking:

  1. search the issues.
  2. search the docs.

之前使用paraformer-large模型,在finetune.sh文件中添加了max_token_length参数后,仍然无法识别大于20s的音频文件,在更换了paraformer-large的长音频版本后,还是出现同样的问题。

这是运行时显示的内容: {'scp_file_list': ['/home/ubuntu1/data/list/train_wav.scp', '/home/ubuntu1/data/list/train_text.txt'], 'data_type_list': ['source', 'target'], 'jsonl_file_out': '/home/ubuntu1/data/list/train.jsonl'} convert wav.scp text to jsonl, ncpu: 32 cpu: 0: 100%|██████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 5.67it/s] cpu: 0: 100%|████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 4804.47it/s] processed 5 samples {'scp_file_list': ['/home/ubuntu1/data/list/val_wav.scp', '/home/ubuntu1/data/list/val_text.txt'], 'data_type_list': ['source', 'target'], 'jsonl_file_out': '/home/ubuntu1/data/list/val.jsonl'} convert wav.scp text to jsonl, ncpu: 32 cpu: 0: 100%|██████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 2.29it/s] cpu: 0: 100%|████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 3818.21it/s] processed 2 samples log_file: ./outputs/log.txt

这是log文件中显示的内容: Model summary: Class Name: BiCifParaformer Total Number of model parameters: 225.07 M Number of trainable parameters: 225.07 M (100.0%) Type: torch.float32 [2024-06-24 15:32:07,818][root][INFO] - Build optim [2024-06-24 15:32:07,822][root][INFO] - Build scheduler [2024-06-24 15:32:07,823][root][INFO] - Build dataloader [2024-06-24 15:32:07,823][root][INFO] - Build dataloader [2024-06-24 15:32:07,823][root][INFO] - total_num of samplers: 1, /home/ubuntu1/data/list/train.jsonl [2024-06-24 15:32:07,823][root][INFO] - total_num of samplers: 2, /home/ubuntu1/data/list/val.jsonl [2024-06-24 15:32:07,823][root][WARNING] - distributed is not initialized, only single shard [2024-06-24 15:32:07,853][root][INFO] - Train epoch: 0, rank: 0

What have you tried?

在finetune.sh中已经添加: ++dataset_conf.max_token_length=30000 \ 但是还是没有作用

What's your environment?

LauraGPT commented 3 months ago

https://github.com/modelscope/FunASR/blob/main/funasr/datasets/audio_datasets/index_ds.py#L20