微调模型acc下降很厉害，loss长时间不收敛

harold-yh commented 8 months ago

OS: centos7.9 x86_64 Python/C++ Version：python3.8 Package Version：pytorch-0.0.1、torchaudio-2.1.0、modelscope-1.9.4、funasr-0.8.4 Model：speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch Command：python finetune.py Details：GPU：3090 24G显存，训练数据大概三万小时，finetune参数如下： params.dataset_type = "large" params.batch_bins = 130000 params.max_epoch = 20 params.lr = 0.0007

问题描述：第一个batch时acc为0.29，目前到第281150batch时acc只有0.086（训练两天多），是数据问题吗？ INFO: 1epoch:train:1-50batch:50num_updates: iter_time=0.058, forward_time=0.372, loss_att=0.840, acc=0.259, loss_pre=0.326, loss=1.166, backward_time=0.202, optim_step_time=0.023, optim0_lr0=6.183e-07, train_time=0.731 ......... INFO: 1epoch:train:281101-281150batch:281150num_updates: iter_time=9.407e-05, forward_time=0.287, loss_att=0.409, acc=0.086, loss_pre=0.457, loss=0.866, backward_time=0.235, optim_step_time=0.020, optim0_lr0=2.287e-04, train_time=0.576

LauraGPT commented 8 months ago

Please check your training data as: https://alibaba-damo-academy.github.io/FunASR/en/egs_modelscope/asr/TEMPLATE/README_zh.html#id12

harold-yh commented 8 months ago

Please check your training data as: https://alibaba-damo-academy.github.io/FunASR/en/egs_modelscope/asr/TEMPLATE/README_zh.html#id12

train data: (asr) [root@nvidia speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch]# head /opt/asr_data/train/text 81947_14029_1 喂你好是史向东先生吗 81949_14029_1 哎我上午跟您通过话的您这个有钱花账单怎么还没有处理现在已经四点半了 81950_14029_0 哦我正往回走着呢是是是中午一直没干完正往回走呢 81952_14029_0 我中午没弄完正往回走着呢我得抓紧回去弄一下 81953_14029_1 嗯那您需要多长时间 81954_14029_0 那我弄完我直接我还就完了 81955_14029_1 啊您什么时候弄完 81956_14029_0 我这回就回去了呀回去我看我的钱我不知道 81957_14029_1 嗯你现在是回去的路上吗 81961_14029_1 哎那您现在抓紧时间好吗

(asr) [root@nvidia speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch]# head /opt/asr_data/train/wav.scp 81947_14029_1 /dstdisk/audio/36/81947_14029_1.wav 81949_14029_1 /dstdisk/audio/36/81949_14029_1.wav 81950_14029_0 /dstdisk/audio/36/81950_14029_0.wav 81952_14029_0 /dstdisk/audio/36/81952_14029_0.wav 81953_14029_1 /dstdisk/audio/36/81953_14029_1.wav

LauraGPT commented 8 months ago

Strange! But I suggest you that you could first use only 1000-hour data to finetune model. And you could verify the data and hyper-parameter

harold-yh commented 7 months ago

OKay,I'll test use 500 hours data now.

modelscope / FunASR

微调模型acc下降很厉害，loss长时间不收敛 #1117