Closed RONINGOD closed 1 month ago
请问您是如何自建模型的呢?
关于自建模型,推荐一个可供参考的转换脚本:https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava/convert_llava_weights_to_hf.py 您可以先参照这个脚本将模型转换为llava-1.5范式
请问您是如何自建模型的呢?
关于自建模型,推荐一个可供参考的转换脚本:https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava/convert_llava_weights_to_hf.py 您可以先参照这个脚本将模型转换为llava-1.5范式
您好,我就是按llava-1.5生成的,替换了vision tower和language model,这个bug应该是设计时候只使用llm进行pretrain吧,只能读入text,后面我把stage改成sft就能正常pretrain了
是的,由于LLaVA中的PT其实和SFT没有本质区别,因此您只需要使用SFT对特定格式的image-caption数据集进行微调,就可以达到预训练效果
Reminder
System Info
rank0: Traceback (most recent call last):
rank0: File "/root/workspaces/LLaMA-Factory/src/llamafactory/launcher.py", line 23, in
rank0: File "/root/workspaces/LLaMA-Factory/src/llamafactory/launcher.py", line 19, in launch
rank0: File "/root/workspaces/LLaMA-Factory/src/llamafactory/train/tuner.py", line 45, in run_exp
rank0: run_pt(model_args, data_args, training_args, finetuning_args, callbacks)
rank0: File "/root/workspaces/LLaMA-Factory/src/llamafactory/train/pt/workflow.py", line 62, in run_pt
rank0: train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
rank0: File "/opt/conda/envs/llama/lib/python3.11/site-packages/transformers/trainer.py", line 1885, in train rank0: return inner_training_loop(
rank0: File "/opt/conda/envs/llama/lib/python3.11/site-packages/transformers/trainer.py", line 2216, in _inner_training_loop rank0: tr_loss_step = self.training_step(model, inputs)
rank0: File "/opt/conda/envs/llama/lib/python3.11/site-packages/transformers/trainer.py", line 3250, in training_step
rank0: File "/opt/conda/envs/llama/lib/python3.11/site-packages/accelerate/accelerator.py", line 2130, in backward
rank0: File "/opt/conda/envs/llama/lib/python3.11/site-packages/torch/_tensor.py", line 525, in backward
rank0: File "/opt/conda/envs/llama/lib/python3.11/site-packages/torch/autograd/init.py", line 267, in backward
rank0: File "/opt/conda/envs/llama/lib/python3.11/site-packages/torch/autograd/graph.py", line 744, in _engine_run_backward rank0: return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
rank0: RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Reproduction
model
model_name_or_path: /root/workspaces/LLaMA-Factory/siglip_qwen2-0.5b visual_inputs: true
method
stage: pt do_train: true finetuning_type: full train_mm_proj_only: true
dataset mllm_demo
dataset: llava_150k_en,llava_150k_zh cutoff_len: 1024 max_samples: 100000 overwrite_cache: true preprocessing_num_workers: 64
output
output_dir: saves/siglip_qwen2-0.5b/train/sft logging_steps: 5 save_steps: 100 plot_loss: true overwrite_output_dir: true
train
per_device_train_batch_size: 2 gradient_accumulation_steps: 8 learning_rate: 1.0e-5 num_train_epochs: 5.0 lr_scheduler_type: cosine warmup_ratio: 0.1 fp16: true ddp_timeout: 180000000
eval
val_size: 0.1 per_device_eval_batch_size: 1 eval_strategy: steps eval_steps: 500
Expected behavior
No response
Others
No response