Could you put attention_mask back for visual language models

modelscope / ms-swift

Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)

https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html

Apache License 2.0

4.35k stars 383 forks source link

Could you put attention_mask back for visual language models #2226

Open YerongLi opened 1 month ago

YerongLi commented 1 month ago

Describe the feature class Template里面可以做padding,但是Qwen2VLTemplateMixin, InternLMXComposer2Template里面只有im_mask，没有，input_ids的attention_mask,（有PADDING的情形）能不能把padding attention_mask都放回去呀。

https://huggingface.co/internlm/internlm-xcomposer2-7b/blob/main/modeling_internlm_xcomposer2.py

Screenshot from 2024-10-11 06-25-27

Jintao-Huang commented 1 month ago

在 data_collator值制作attention_mask