SFT qwen/Qwen2-VL-7B-Instruct did not pass/process the image embedding for feedforwarding

YerongLi commented 3 weeks ago

Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图)

# Experimental environment: 2 * A10
# 2 * 19GB GPU memory
if [ -z "$1" ]; then
  echo "Error: No GPU argument provided."
  echo "Usage: $0 <GPU>"
  exit 1
fi
export CUDA_DEVICE_MAX_CONNECTIONS=1
export ds_master_port=$((29000 + RANDOM % 1000))

GPU=$1
GPUS_PER_NODE=$(echo $GPU | tr ',' '\n' | wc -l)
nproc_per_node=$GPUS_PER_NODE

PYTHONPATH=../../.. \
CUDA_VISIBLE_DEVICES=$GPU \
torchrun \
    --nproc_per_node=$nproc_per_node \
    --master_port $ds_master_port \
    llm_sft.py \
    --model_id_or_path qwen/Qwen2-VL-7B-Instruct \
    --model_revision master \
    --dataset coco-en-mini#20000 \
    --sft_type lora \
    --reeng true \
    --lorra_alpha 16 \
    --user_tag '' \
    --assistant_tag '[/INST]' \
    --control_template "{type}" \
    --template_system "ixc_system" \
    --pos_type 'As a precise assistant solving a vision math problem, extract key information from the image, solve the following math problem, and carefully reason through each step to provide a truthful and accurate solution.' \
    --neg_type 'As a careless assistant solving a vision math problem, instead of understanding the image and question carefully, use random clues from the image to make up some reasoning and solve the following math problem.' \
    --target_layers "10,12,14,16,18,20" \
    --tuner_backend peft \
    --dtype AUTO \
    --output_dir output \
    --ddp_backend nccl \
    --train_dataset_sample -1 \
    --num_train_epochs 1 \
    --max_length 2048 \
    --check_dataset_strategy warning \
    --lora_rank 8 \
    --lora_alpha 32 \
    --lora_dropout_p 0.05 \
    --lora_target_modules q_proj k_proj v_proj \
    --gradient_checkpointing true \
    --batch_size 1 \
    --weight_decay 0.1 \
    --learning_rate 1e-4 \
    --gradient_accumulation_steps $(expr 16 / $nproc_per_node) \
    --max_grad_norm 0.5 \
    --warmup_ratio 0.03 \
    --eval_steps 100 \
    --save_steps 100 \
    --save_total_limit 2 \
    --logging_steps 10 \
    --use_flash_attn false \
    --deepspeed default-zero2 \
    --report_to none \
    --max_steps 200 \
    # --resume_from_checkpoint output/qwen2_5-7b/v17-20240926-073918/checkpoint-100
# $MODELS/Qwen2.5-7B

https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py thinks the input_embeds is None which is to say _post_encode does not work

[rank0]:   File "/home/yerong2/local/miniconda3/envs/qw/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1687, in forward
[rank0]:     print(inputs_embeds.shape)
[rank0]:           ^^^^^^^^^^^^^^^^^^^
[rank0]: AttributeError: 'NoneType' object has no attribute 'shape'
Train:   0%|

Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息，如CUDA版本，系统，GPU型号和torch版本等)

Additional context Add any other context about the problem here(在这里补充其他信息)

YerongLi commented 3 weeks ago

I made sure the template_type is Qwen-VL template, the error persists which is to say __post_encode does not work.

 ===== qwfine/sft_repe.py ====
<utils.template.Qwen2VLTemplate object at 0x7f4b34eb6810>
 ===== qwfine/sft_repe.py ====

Jintao-Huang commented 3 weeks ago

please use torch>=2.0

YerongLi commented 3 weeks ago

please use torch>=2.0

(qw) yerong2@ qwfine$ pip show torch
Name: torch
Version: 2.4.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /home/yerong2/local/miniconda3/envs/qw/lib/python3.11/site-packages
Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-nccl-cu12, nvidia-nvtx-cu12, sympy, triton, typing-extensions
Required-by: accelerate, bitsandbytes, deepspeed, evalscope, ms-opencompass, ms-vlmeval, peft, sentence-transformers, torchvision, trl, xtuner

YerongLi commented 3 weeks ago

I don't think self.post_encode is ever added in to the res_extra, https://github.com/modelscope/ms-swift/blob/7594d19188cf2fd6f592b0216ccd421179616b38/swift/llm/utils/template.py#L343

                for d in data:
                    res_extra.append(self._post_encode(module, d))
                print(' === qwfine/utils/template.py ===')
                print('This branch')
                print(' === qwfine/utils/template.py ===')
                exit(0)

YerongLi commented 3 weeks ago

My mistake, making sure to get the correct template will get _post_encode work

modelscope / ms-swift

SFT qwen/Qwen2-VL-7B-Instruct did not pass/process the image embedding for feedforwarding #2162