modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
6.99k stars 744 forks source link

seaco语音模型微调报错:'NoneType' object has no attribute 'contiguous' #2176

Closed smengfei closed 3 weeks ago

smengfei commented 3 weeks ago

Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

🐛 Bug

使用funasr框架对seaco模型进行微调训练,报错'NoneType' object has no attribute 'contiguous',后通过阅读代码发现在FunASR/funasr/datasets/audio_datasets/datasets.py文件内对AudioDatasetHotword类进行初始化时,方法中seaco_id参数的默认值是bool=0,造成后面的collator函数中的seaco_label_pad参数被赋值为了None,从而引发'NoneType' object has no attribute 'contiguous'异常。后将初始化方法中的seaco_id默认值设置为True后,能正常进行训练,但尚不清楚对seaco_id参数的改动,是否对模型训练的结果和效果有影响。

To Reproduce

Error executing job with overrides: '++model=/model/meeting_train_model/iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch', '++train_data_set_list=../../../data/list/train.jsonl', '++valid_data_set_list=../../../data/list/val.jsonl', '++dataset=AudioDatasetHotword', '++dataset_conf.index_ds=IndexDSJsonl', '++dataset_conf.data_split_num=1', '++dataset_conf.batch_sampler=BatchSampler', '++dataset_conf.batch_size=6000', '++dataset_conf.sort_size=1024', '++dataset_conf.batch_type=token', '++dataset_conf.num_workers=4', '++train_conf.max_epoch=5', '++train_conf.log_interval=1', '++train_conf.resume=true', '++train_conf.validate_interval=2000', '++train_conf.save_checkpoint_interval=2000', '++train_conf.avg_keep_nbest_models_type=loss', '++train_conf.keep_nbest_models=20', '++train_conf.avg_nbest_model=10', '++train_conf.use_deepspeed=false', '++train_conf.deepspeed_config=', '++optim_conf.lr=0.0002', '++output_dir=./outputs': Traceback (most recent call last): rank0: File "/home/ztyl/FunASR/examples/industrial_data_pretraining/seaco_paraformer/../../../funasr/bin/train_ds.py", line 225, in

rank0: File "/opt/minconda3/envs/meeting/lib/python3.9/site-packages/hydra/main.py", line 94, in decorated_main

rank0: File "/opt/minconda3/envs/meeting/lib/python3.9/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra

rank0: File "/opt/minconda3/envs/meeting/lib/python3.9/site-packages/hydra/_internal/utils.py", line 457, in _run_app

rank0: File "/opt/minconda3/envs/meeting/lib/python3.9/site-packages/hydra/_internal/utils.py", line 223, in run_and_report rank0: raise ex rank0: File "/opt/minconda3/envs/meeting/lib/python3.9/site-packages/hydra/_internal/utils.py", line 220, in run_and_report rank0: return func() rank0: File "/opt/minconda3/envs/meeting/lib/python3.9/site-packages/hydra/_internal/utils.py", line 458, in rank0: lambda: hydra.run( rank0: File "/opt/minconda3/envs/meeting/lib/python3.9/site-packages/hydra/internal/hydra.py", line 132, in run rank0: = ret.return_value rank0: File "/opt/minconda3/envs/meeting/lib/python3.9/site-packages/hydra/core/utils.py", line 260, in return_value rank0: raise self._return_value rank0: File "/opt/minconda3/envs/meeting/lib/python3.9/site-packages/hydra/core/utils.py", line 186, in run_job rank0: ret.return_value = task_function(task_cfg) rank0: File "/home/ztyl/FunASR/examples/industrial_data_pretraining/seaco_paraformer/../../../funasr/bin/train_ds.py", line 56, in main_hydra

rank0: File "/home/ztyl/FunASR/examples/industrial_data_pretraining/seaco_paraformer/../../../funasr/bin/train_ds.py", line 173, in main

rank0: File "/home/ztyl/FunASR/funasr/train_utils/trainer_ds.py", line 603, in train_epoch rank0: self.forward_step(model, batch, loss_dict=loss_dict) rank0: File "/home/ztyl/FunASR/funasr/train_utils/trainer_ds.py", line 670, in forward_step rank0: retval = model(batch) rank0: File "/opt/minconda3/envs/meeting/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl rank0: return self._call_impl(*args, *kwargs) rank0: File "/opt/minconda3/envs/meeting/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl rank0: return forward_call(args, kwargs) rank0: File "/opt/minconda3/envs/meeting/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward rank0: else self._run_ddp_forward(*inputs, kwargs) rank0: File "/opt/minconda3/envs/meeting/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward rank0: return self.module(*inputs, *kwargs) # type: ignoreindex: File "/opt/minconda3/envs/meeting/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl rank0: return self._call_impl(args, kwargs) rank0: File "/opt/minconda3/envs/meeting/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl rank0: return forward_call(*args, kwargs) rank0: File "/home/ztyl/FunASR/funasr/models/seaco_paraformer/model.py", line 150, in forward rank0: loss_seaco = self._calc_seaco_loss( rank0: File "/home/ztyl/FunASR/funasr/models/seaco_paraformer/model.py", line 230, in _calc_seaco_loss rank0: loss_att = self.criterion_seaco(dha_output, seaco_label_pad) rank0: File "/opt/minconda3/envs/meeting/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl rank0: return self._call_impl(*args, *kwargs) rank0: File "/opt/minconda3/envs/meeting/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl rank0: return forward_call(args, kwargs) rank0: File "/home/ztyl/FunASR/funasr/losses/label_smoothing_loss.py", line 54, in forward rank0: target = target.contiguous().view(-1)

rank0:[W1026 13:14:22.070978864 ProcessGroupNCCL.cpp:1168] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator()) W1026 13:14:24.528461 140404185367168 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 2330636 closing signal SIGTERM E1026 13:14:25.194758 140404185367168 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: 1) local_rank: 0 (pid: 2330635) of binary: /opt/minconda3/envs/meeting/bin/python

Environment

Additional context