modelscope / ms-swift

Use PEFT or Full-parameter to finetune 350+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html
Apache License 2.0
3.93k stars 347 forks source link

微调internvl-v1.5报错KeyError: 'input_ids' #951

Closed sunzx8 closed 4 months ago

sunzx8 commented 5 months ago

Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)

image

运行指令 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 swift sft --model_type internvl-chat-v1_5 --model_id_or_path /dev/shm/shawn/hf_ms_model/InternVL-Chat-V1-5 --dataset /dev/shm/shawn/data/ftoy.jsonl --sft_type full

数据格式为 {"query": "输出图片内容的markdown内容,如果有表格,则输出为html格式", "response": "```markdown\nAdaptive Quotient Filters\n\nConference '17, July 2017, Washington, DC, USA\n\n[34] Russell Housley, Warwick Ford, William Polk, and David Solo. 1999. Internet X.509 public key infrastructure certificate and CRL profile. Technical Report. M. Frans Kaashoek. 2002. The case for application-specific protocols. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP).", "images": ["/dev/shm/shawn/data/input/2405.10253v1/2405.10253v1-p16.png"]}

Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等) 8*L20

image

Additional context Add any other context about the problem here(在这里补充其他信息)

sunzx8 commented 5 months ago

我查了一下这个batch返回的是图片的两个元素,没有input_ids

image image image

请问这是什么原因?

hjh0119 commented 5 months ago

八卡device map可能会有问题,试下2/4卡

sunzx8 commented 5 months ago

您好,我这里测出来是max_length的问题,请问为什么我设置max_length从2048到4096过后就会报错 RuntimeError: CUDA error: unspecified launch failure CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

sunzx8 commented 5 months ago

还有我想问一下如果需要16卡两台机器一起微调需要怎么设置?

hjh0119 commented 5 months ago

CUDA报错,可能是OOM或者CUDA环境问题

多机多卡readme里有样例

sunzx8 commented 5 months ago

CUDA报错,可能是OOM或者CUDA环境问题

多机多卡readme里有样例

还有个问题,我发现用您给的lora微调方式虽然param显示只训练了很少的参数,但是显存消耗和全参数一模一样,请问这是不是实际没有转换过来?

image

实际消耗显存和全参数微调coco-mini的一样是241gb

sunzx8 commented 5 months ago

CUDA报错,可能是OOM或者CUDA环境问题 多机多卡readme里有样例

还有个问题,我发现用您给的lora微调方式虽然param显示只训练了很少的参数,但是显存消耗和全参数一模一样,请问这是不是实际没有转换过来? image 实际消耗显存和全参数微调coco-mini的一样是241gb

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 swift sft --model_type internvl-chat-v1_5 --model_id_or_path /dev/shm/shawn/hf_ms_model/InternVL-Chat-V1-5 --dataset coco-mini-en-2 --sft_type lora

hjh0119 commented 4 months ago

八卡device map可能会有问题,试下2/4卡