A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
这是log文件中显示的内容:
Model summary:
Class Name: BiCifParaformer
Total Number of model parameters: 225.07 M
Number of trainable parameters: 225.07 M (100.0%)
Type: torch.float32
[2024-06-24 15:32:07,818][root][INFO] - Build optim
[2024-06-24 15:32:07,822][root][INFO] - Build scheduler
[2024-06-24 15:32:07,823][root][INFO] - Build dataloader
[2024-06-24 15:32:07,823][root][INFO] - Build dataloader
[2024-06-24 15:32:07,823][root][INFO] - total_num of samplers: 1, /home/ubuntu1/data/list/train.jsonl
[2024-06-24 15:32:07,823][root][INFO] - total_num of samplers: 2, /home/ubuntu1/data/list/val.jsonl
[2024-06-24 15:32:07,823][root][WARNING] - distributed is not initialized, only single shard
[2024-06-24 15:32:07,853][root][INFO] - Train epoch: 0, rank: 0
Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)
❓ Questions and Help
Before asking:
之前使用paraformer-large模型,在finetune.sh文件中添加了max_token_length参数后,仍然无法识别大于20s的音频文件,在更换了paraformer-large的长音频版本后,还是出现同样的问题。
这是运行时显示的内容: {'scp_file_list': ['/home/ubuntu1/data/list/train_wav.scp', '/home/ubuntu1/data/list/train_text.txt'], 'data_type_list': ['source', 'target'], 'jsonl_file_out': '/home/ubuntu1/data/list/train.jsonl'} convert wav.scp text to jsonl, ncpu: 32 cpu: 0: 100%|██████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 5.67it/s] cpu: 0: 100%|████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 4804.47it/s] processed 5 samples {'scp_file_list': ['/home/ubuntu1/data/list/val_wav.scp', '/home/ubuntu1/data/list/val_text.txt'], 'data_type_list': ['source', 'target'], 'jsonl_file_out': '/home/ubuntu1/data/list/val.jsonl'} convert wav.scp text to jsonl, ncpu: 32 cpu: 0: 100%|██████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 2.29it/s] cpu: 0: 100%|████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 3818.21it/s] processed 2 samples log_file: ./outputs/log.txt
这是log文件中显示的内容: Model summary: Class Name: BiCifParaformer Total Number of model parameters: 225.07 M Number of trainable parameters: 225.07 M (100.0%) Type: torch.float32 [2024-06-24 15:32:07,818][root][INFO] - Build optim [2024-06-24 15:32:07,822][root][INFO] - Build scheduler [2024-06-24 15:32:07,823][root][INFO] - Build dataloader [2024-06-24 15:32:07,823][root][INFO] - Build dataloader [2024-06-24 15:32:07,823][root][INFO] - total_num of samplers: 1, /home/ubuntu1/data/list/train.jsonl [2024-06-24 15:32:07,823][root][INFO] - total_num of samplers: 2, /home/ubuntu1/data/list/val.jsonl [2024-06-24 15:32:07,823][root][WARNING] - distributed is not initialized, only single shard [2024-06-24 15:32:07,853][root][INFO] - Train epoch: 0, rank: 0
What have you tried?
在finetune.sh中已经添加: ++dataset_conf.max_token_length=30000 \ 但是还是没有作用
What's your environment?
pip
, source):