Open JVfisher opened 1 week ago
Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)
使用单卡,12万条语音微调seaco_paraformer模型时,dataset_conf.num_workers设置为16,cpu利用率100%,但显卡的利用率一直在5%-20%之间波动,显存占用95%,如何提高显存利用率?
torchrun \ --nnodes 1 \ --nproc_per_node ${gpu_num} \ /mnt/data/finetunedir/FunASR/funasr/bin/train_ds.py \ ++model="${model_name_or_model_dir}" \ ++train_data_set_list="${train_data}" \ ++valid_data_set_list="${val_data}" \ ++dataset="AudioDatasetHotword" \ ++dataset_conf.index_ds="IndexDSJsonl" \ ++dataset_conf.data_split_num=1 \ ++dataset_conf.batch_sampler="BatchSampler" \ ++dataset_conf.batch_size=7500 \ ++dataset_conf.max_token_length=2000 \ ++dataset_conf.batch_type="token" \ ++dataset_conf.num_workers=16 \ ++train_conf.max_epoch=60 \ ++train_conf.log_interval=1 \ ++train_conf.resume=true \ ++train_conf.validate_interval=8000 \ ++train_conf.save_checkpoint_interval=8000 \ ++train_conf.avg_keep_nbest_models_type='loss' \ ++train_conf.keep_nbest_models=10 \ ++optim_conf.lr=0.0002 \ ++output_dir="${output_dir}" &> ${log_file}
已经尝试过增加或者减少num_workers,没有效果
pip
在modelscope框架中使用日语asr推理的时候GPU利用率只有15%左右,显存使用不是很多,sfmn推理时候的GPU利用率只有 3%左右,不知道这个GPU利用率还能不能设置更高,以增加推理速度。
Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)
❓ Questions and Help
Before asking:
What is your question?
使用单卡,12万条语音微调seaco_paraformer模型时,dataset_conf.num_workers设置为16,cpu利用率100%,但显卡的利用率一直在5%-20%之间波动,显存占用95%,如何提高显存利用率?
Code
torchrun \ --nnodes 1 \ --nproc_per_node ${gpu_num} \ /mnt/data/finetunedir/FunASR/funasr/bin/train_ds.py \ ++model="${model_name_or_model_dir}" \ ++train_data_set_list="${train_data}" \ ++valid_data_set_list="${val_data}" \ ++dataset="AudioDatasetHotword" \ ++dataset_conf.index_ds="IndexDSJsonl" \ ++dataset_conf.data_split_num=1 \ ++dataset_conf.batch_sampler="BatchSampler" \ ++dataset_conf.batch_size=7500 \ ++dataset_conf.max_token_length=2000 \ ++dataset_conf.batch_type="token" \ ++dataset_conf.num_workers=16 \ ++train_conf.max_epoch=60 \ ++train_conf.log_interval=1 \ ++train_conf.resume=true \ ++train_conf.validate_interval=8000 \ ++train_conf.save_checkpoint_interval=8000 \ ++train_conf.avg_keep_nbest_models_type='loss' \ ++train_conf.keep_nbest_models=10 \ ++optim_conf.lr=0.0002 \ ++output_dir="${output_dir}" &> ${log_file}
What have you tried?
已经尝试过增加或者减少num_workers,没有效果
What's your environment?
pip
, source): source