hiyouga / LLaMA-Factory

Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
30.32k stars 3.74k forks source link

怎么实现自建vlm模仿llava进行pt,报错RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn,发现问题在于添加了visual_inputs: true的情况下,stage:pt最后读入的数据不包含图片pixel_value #4707

Closed RONINGOD closed 1 month ago

RONINGOD commented 2 months ago

Reminder

System Info

rank0: Traceback (most recent call last):
rank0: File "/root/workspaces/LLaMA-Factory/src/llamafactory/launcher.py", line 23, in

rank0: File "/root/workspaces/LLaMA-Factory/src/llamafactory/launcher.py", line 19, in launch

rank0: File "/root/workspaces/LLaMA-Factory/src/llamafactory/train/tuner.py", line 45, in run_exp
rank0: run_pt(model_args, data_args, training_args, finetuning_args, callbacks)
rank0: File "/root/workspaces/LLaMA-Factory/src/llamafactory/train/pt/workflow.py", line 62, in run_pt
rank0: train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)

rank0: File "/opt/conda/envs/llama/lib/python3.11/site-packages/transformers/trainer.py", line 1885, in train rank0: return inner_training_loop(

rank0: File "/opt/conda/envs/llama/lib/python3.11/site-packages/transformers/trainer.py", line 2216, in _inner_training_loop rank0: tr_loss_step = self.training_step(model, inputs)

rank0: File "/opt/conda/envs/llama/lib/python3.11/site-packages/transformers/trainer.py", line 3250, in training_step

rank0: File "/opt/conda/envs/llama/lib/python3.11/site-packages/accelerate/accelerator.py", line 2130, in backward

rank0: File "/opt/conda/envs/llama/lib/python3.11/site-packages/torch/_tensor.py", line 525, in backward

rank0: File "/opt/conda/envs/llama/lib/python3.11/site-packages/torch/autograd/init.py", line 267, in backward

rank0: File "/opt/conda/envs/llama/lib/python3.11/site-packages/torch/autograd/graph.py", line 744, in _engine_run_backward rank0: return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass

rank0: RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Reproduction

model

model_name_or_path: /root/workspaces/LLaMA-Factory/siglip_qwen2-0.5b visual_inputs: true

method

stage: pt do_train: true finetuning_type: full train_mm_proj_only: true

dataset mllm_demo

dataset: llava_150k_en,llava_150k_zh cutoff_len: 1024 max_samples: 100000 overwrite_cache: true preprocessing_num_workers: 64

output

output_dir: saves/siglip_qwen2-0.5b/train/sft logging_steps: 5 save_steps: 100 plot_loss: true overwrite_output_dir: true

train

per_device_train_batch_size: 2 gradient_accumulation_steps: 8 learning_rate: 1.0e-5 num_train_epochs: 5.0 lr_scheduler_type: cosine warmup_ratio: 0.1 fp16: true ddp_timeout: 180000000

eval

val_size: 0.1 per_device_eval_batch_size: 1 eval_strategy: steps eval_steps: 500

Expected behavior

No response

Others

No response

BUAADreamer commented 1 month ago

请问您是如何自建模型的呢?

关于自建模型,推荐一个可供参考的转换脚本:https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava/convert_llava_weights_to_hf.py 您可以先参照这个脚本将模型转换为llava-1.5范式

RONINGOD commented 1 month ago

请问您是如何自建模型的呢?

关于自建模型,推荐一个可供参考的转换脚本:https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava/convert_llava_weights_to_hf.py 您可以先参照这个脚本将模型转换为llava-1.5范式

您好,我就是按llava-1.5生成的,替换了vision tower和language model,这个bug应该是设计时候只使用llm进行pretrain吧,只能读入text,后面我把stage改成sft就能正常pretrain了

BUAADreamer commented 1 month ago

是的,由于LLaVA中的PT其实和SFT没有本质区别,因此您只需要使用SFT对特定格式的image-caption数据集进行微调,就可以达到预训练效果