使用lora之后推理报错 “RuntimeError: cutlassF: no kernel found to launch!”

一、问题描述：使用Lora进行认知训练，正常完成。然后使用命令行推理，报错“RuntimeError: cutlassF: no kernel found to launch!”。请帮忙看看问题原因。非常感谢！

二、详细描述：

认知训练，正常结束 python llm_sft.py \ --model_type qwen1half-0_5b-chat \ --model_id_or_path /home/work/pbg/Qwen1.5-0.5B-Chat \ --dtype fp16 \ --sft_type lora \ --tuner_backend peft \ --output_dir output \ --dataset blossom-math-zh \ --train_dataset_sample 1000 \ --num_train_epochs 2 \ --max_length 512 \ --check_dataset_strategy warning \ --lora_rank 8 \ --lora_alpha 32 \ --lora_dropout_p 0.05 \ --lora_target_modules ALL \ --gradient_checkpointing true \ --batch_size 2 \ --weight_decay 0.1 \ --learning_rate 1e-4 \ --gradient_accumulation_steps 16 \ --max_grad_norm 0.5 \ --warmup_ratio 0.03 \ --eval_steps 100 \ --save_steps 100 \ --save_total_limit 2 \ --logging_steps 10 \ --use_flash_attn false \ --self_cognition_sample 1000 \ --model_name 认知模型 \ --model_author 大数据中心

训练输出： {'loss': 1.04667056, 'acc': 0.76542324, 'learning_rate': 2.5e-05, 'epoch': 0.02, 'global_step': 1}
{'loss': 1.32619603, 'acc': 0.70858145, 'learning_rate': 9.5e-05, 'epoch': 0.16, 'global_step': 10}
{'loss': 1.00710545, 'acc': 0.74251723, 'learning_rate': 8.667e-05, 'epoch': 0.32, 'global_step': 20}
{'loss': 0.88376904, 'acc': 0.76463094, 'learning_rate': 7.833e-05, 'epoch': 0.48, 'global_step': 30}
{'loss': 0.76665134, 'acc': 0.79404073, 'learning_rate': 7e-05, 'epoch': 0.65, 'global_step': 40}
{'loss': 0.665101, 'acc': 0.81325998, 'learning_rate': 6.167e-05, 'epoch': 0.81, 'global_step': 50}
{'loss': 0.70611439, 'acc': 0.79771628, 'learning_rate': 5.333e-05, 'epoch': 0.97, 'global_step': 60}
{'loss': 0.61564317, 'acc': 0.8232132, 'learning_rate': 4.5e-05, 'epoch': 1.13, 'global_step': 70}
{'loss': 0.58244119, 'acc': 0.82987413, 'learning_rate': 3.667e-05, 'epoch': 1.29, 'global_step': 80}
{'loss': 0.58096352, 'acc': 0.83518486, 'learning_rate': 2.833e-05, 'epoch': 1.45, 'global_step': 90}
{'loss': 0.56580772, 'acc': 0.83964577, 'learning_rate': 2e-05, 'epoch': 1.61, 'global_step': 100}
Train: 81%|████████████████████████████████████████████████████████████████████████████████████████▋ | 100/124 [08:02<01:53, 4.73s/it] {'eval_loss': 0.54231346, 'eval_acc': 0.83992095, 'eval_runtime': 0.4726, 'eval_samples_per_second': 21.158, 'eval_steps_per_second': 10.579, 'epoch': 1.61, 'global_step': 100} Val: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 15.47it/s] [INFO:swift] Saving model checkpoint to /home/work/pbg/swift-main/examples/pytorch/llm/output/qwen1half-0_5b-chat/v1-20240508-134134/checkpoint-100 /home/haitaiwork/llm/anaconda3/envs/gpt/lib/python3.8/site-packages/peft/utils/save_and_load.py:154: UserWarning: Could not find a config file in /home/haitaiwork/pbg/Qwen1.5-0.5B-Chat - will assume that the vocabulary was not modified. warnings.warn( {'loss': 0.55450773, 'acc': 0.84238644, 'learning_rate': 1.167e-05, 'epoch': 1.77, 'global_step': 110}
{'loss': 0.54164171, 'acc': 0.84827271, 'learning_rate': 3.33e-06, 'epoch': 1.94, 'global_step': 120}
Train: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 124/124 [09:55<00:00, 4.57s/it] {'eval_loss': 0.54169428, 'eval_acc': 0.83992095, 'eval_runtime': 0.4762, 'eval_samples_per_second': 20.999, 'eval_steps_per_second': 10.499, 'epoch': 2.0, 'global_step': 124} Val: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 15.18it/s] [INFO:swift] Saving model checkpoint to /home/work/pbg/swift-main/examples/pytorch/llm/output/qwen1half-0_5b-chat/v1-20240508-134134/checkpoint-124 /home/haitaiwork/llm/anaconda3/envs/gpt/lib/python3.8/site-packages/peft/utils/save_and_load.py:154: UserWarning: Could not find a config file in /home/haitaiwork/pbg/Qwen1.5-0.5B-Chat - will assume that the vocabulary was not modified. warnings.warn( {'train_runtime': 596.8382, 'train_samples_per_second': 6.702, 'train_steps_per_second': 0.208, 'train_loss': 0.7253726, 'epoch': 2.0, 'global_step': 124}
Train: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 124/124 [09:56<00:00, 4.81s/it]

[INFO:swift] best_model_checkpoint: /home/work/pbg/swift-main/examples/pytorch/llm/output/qwen1half-0_5b-chat/v1-20240508-134134/checkpoint-124 [INFO:swift] images_dir: /home/work/pbg/swift-main/examples/pytorch/llm/output/qwen1half-0_5b-chat/v1-20240508-134134/images [INFO:swift] End time of running main: 2024-05-08 13:51:44.413026

推理 swift infer --ckpt_dir '/home/work/pbg/swift-main/examples/pytorch/llm/output/qwen1half-0_5b-chat/v1-20240508-134134/checkpoint-124'
推理错误 [INFO:swift] system: You are a helpful assistant. [INFO:swift] Input exit or quit to exit the conversation. [INFO:swift] Input multi-line to switch to multi-line input mode. [INFO:swift] Input reset-system to reset the system and clear the history. [INFO:swift] Input clear to clear the history. <<< <<< <<< 你是谁 Exception in thread Thread-1: Traceback (most recent call last): File "/home/haitaiwork/llm/anaconda3/envs/gpt/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/home/haitaiwork/llm/anaconda3/envs/gpt/lib/python3.8/threading.py", line 870, in run self._target(*self._args, self._kwargs) File "/home/haitaiwork/llm/anaconda3/envs/gpt/lib/python3.8/site-packages/peft/peft_model.py", line 1190, in generate outputs = self.base_model.generate(*args, *kwargs) File "/home/haitaiwork/llm/anaconda3/envs/gpt/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/home/haitaiwork/llm/anaconda3/envs/gpt/lib/python3.8/site-packages/transformers/generation/utils.py", line 1525, in generate return self.sample( File "/home/haitaiwork/llm/anaconda3/envs/gpt/lib/python3.8/site-packages/transformers/generation/utils.py", line 2622, in sample outputs = self( File "/home/haitaiwork/llm/anaconda3/envs/gpt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/haitaiwork/llm/anaconda3/envs/gpt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/home/haitaiwork/llm/anaconda3/envs/gpt/lib/python3.8/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(args, kwargs) File "/home/haitaiwork/llm/anaconda3/envs/gpt/lib/python3.8/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 1173, in forward outputs = self.model( File "/home/haitaiwork/llm/anaconda3/envs/gpt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/haitaiwork/llm/anaconda3/envs/gpt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/home/haitaiwork/llm/anaconda3/envs/gpt/lib/python3.8/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 1058, in forward layer_outputs = decoder_layer( File "/home/haitaiwork/llm/anaconda3/envs/gpt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/haitaiwork/llm/anaconda3/envs/gpt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "/home/haitaiwork/llm/anaconda3/envs/gpt/lib/python3.8/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, *kwargs) File "/home/haitaiwork/llm/anaconda3/envs/gpt/lib/python3.8/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 773, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/home/haitaiwork/llm/anaconda3/envs/gpt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/haitaiwork/llm/anaconda3/envs/gpt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/home/haitaiwork/llm/anaconda3/envs/gpt/lib/python3.8/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(args, **kwargs) File "/home/haitaiwork/llm/anaconda3/envs/gpt/lib/python3.8/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 698, in forward attn_output = torch.nn.functional.scaled_dot_product_attention( RuntimeError: cutlassF: no kernel found to launch!

请帮忙看看问题原因。非常感谢！

modelscope / ms-swift

使用lora之后推理报错 “RuntimeError: cutlassF: no kernel found to launch!” #877