modelscope / ms-swift

Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html
Apache License 2.0
4.39k stars 385 forks source link

关于微调定位框的问题? #2317

Open pange1802703882 opened 1 month ago

pange1802703882 commented 1 month ago

您好,按照 https://github.com/modelscope/ms-swift/blob/main/docs/source/Multi-Modal/qwen2-vl%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md 文档说明,微调自己的数据集之后,检测框的精度比较差。按照https://github.com/huggingface/transformers/pull/33487 所述更改了transformers ==4.45.2的内容之后,发现效果依然不佳。发现在训练过程中position_ids一直不为空,input_ids一直为空;在推理过程中,position_ids一直都不为空, 请问这是什么原因呀?

kjgfcdb commented 1 week ago

Same here