Open Pro-flynn opened 11 months ago
How do you think I should adjust my training strategy?
i think the epoch num is tooooo big, my model is also a little bit overfitting after full finetune in 20k data and 2 epoch(batch 4 ), and the loss was 0.67,
What's the current inference performance? Do you think LLava is suitable for this kind of object detection task?
Maybe you can check ocr llava,someone already did it. And they use ocr dataset both in pretrain and finetune.
Llm has show outstanding performance in ocr . I think llava can made it
https://llavar.github.io/ Check this
I've also adopted a similar approach for training my model. However, I find myself perplexed upon reviewing the training statistics.
wandb: Run history:
wandb: train/epoch ▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
wandb: train/global_step ▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
wandb: train/learning_rate ▄███████▇▇▇▇▇▆▆▆▆▅▅▅▅▄▄▄▃▃▃▃▂▂▂▂▂▁▁▁▁▁▁▁
wandb: train/loss █▆▇▆▆▆▆▅▆▅▄▄▄▃▂▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
wandb: train/total_flos ▁
wandb: train/train_loss ▁
wandb: train/train_runtime ▁
wandb: train/train_samples_per_second ▁
wandb: train/train_steps_per_second ▁
wandb:
wandb: Run summary:
wandb: train/epoch 20.0
wandb: train/global_step 360
wandb: train/learning_rate 0.0
wandb: train/loss 0.0072
wandb: train/total_flos 1967070044160.0
wandb: train/train_loss 0.4161
wandb: train/train_runtime 848.715
wandb: train/train_samples_per_second 3.323
wandb: train/train_steps_per_second 0.424
I'm puzzled about the distinction between train/train_loss with a value of 0.4161 and train/loss with a value of 0.0072. Could someone please clarify this for me?
Also, I have noticed the same issue. The results on the unseen dataset is really bad
Could anyone tell what hardware are you guys finetuning on? I tried on one A10G with batch_per_device =1, But getting OOM error.
After trying with a couple of different machines I used the A100 GCP instance and it worked like a charm.
You can try lowering the number of epochs. Check out the example here, I finetuned for 3 epochs with batch size 8 on a 100 GPT-4V captioned anime examples, and it already works great: https://github.com/haotian-liu/LLaVA/issues/766#issuecomment-1800214174. You an also take a look at the wandb logs, the training loss should not be too low, which indicates overfitting. Additionally, fusing a few samples from LLava-instruct or llava-v1.5 data mixture may also help reduce the overfitting.
@Nomiluks one of them is probably the end-of-epoch stats (there will be just one number for a single experiment), and the other may be the last iter stats (one number for each iterations, but only the last iter is displayed), looking at the wandb interface may allow you better understand the stats.
You can try lowering the number of epochs. Check out the example here, I finetuned for 3 epochs with batch size 8 on a 100 GPT-4V captioned anime examples, and it already works great: #766 (comment). You an also take a look at the wandb logs, the training loss should not be too low, which indicates overfitting. Additionally, fusing a few samples from LLava-instruct or llava-v1.5 data mixture may also help reduce the overfitting.
@Nomiluks one of them is probably the end-of-epoch stats (there will be just one number for a single experiment), and the other may be the last iter stats (one number for each iterations, but only the last iter is displayed), looking at the wandb interface may allow you better understand the stats.
Thank for your response @haotian-liu
I'm working on implementing LLaVA to identify pixel-based image forgery or tampering in my dataset. I currently have 100 samples, and I'm considering LORA based fine-tuning as suggested in the documentation. Do you believe this sample size is sufficient for effective fine-tuning? Additionally, I'm open to any advice or best practices for training LLaVA to specifically detect image forgery. Your insights would be greatly appreciated!
Training Example:
{
"id": "tampered_654c8796140dc970e0d179d5-back",
"image": "tampered_654c8796140dc970e0d179d5-back.jpeg",
"conversations": [
{
"from": "human",
"value": "<image>\nAnalyze the provided document image with the objective of detecting potential instances of image forgery resulting from digital tampering or manipulation. Identify all manipulated regions and present the results in the following format: [[x0, y0, x1, y1]]. If no tampered regions are identified, please return [[]]."
},
{
"from": "gpt",
"value": "[[0.578, 0.604, 0.938, 0.99]]"
}
]
}
I wonder if LLaVA faces a brand new domain, should we do something like fine-tuning the visual encoder at the first step cause right now the vision encoder is not tuned?
@Nomiluks According to my experiment, a size of 100 may easily cause overfitting, I tried to enlarge my dataset to 8000 entries (containing a few LLaVA instructions). However, the result shows the descending performance even cannot interpret the "man behind the taxi" example, I am still figuring out the cause.
@Nomiluks According to my experiment, a size of 100 may easily cause overfitting, I tried to enlarge my dataset to 8000 entries (containing a few LLaVA instructions). However, the result shows the descending performance even cannot interpret the "man behind the taxi" example, I am still figuring out the cause.
Yes, I am also having the same problem, have you found out the cause?
i think the epoch num is tooooo big, my model is also a little bit overfitting after full finetune in 20k data and 2 epoch(batch 4 ), and the loss was 0.67,
Is the 0.67 the overall loss you're referring to? It seems a bit high; typically, we aim for a loss close to 0 for a well-fit model. This value might suggest that the model is underfitting. Could you provide more context or details about the training process? It's important to assess whether this level of loss is acceptable for your specific use case.
@Nomiluks According to my experiment, a size of 100 may easily cause overfitting, I tried to enlarge my dataset to 8000 entries (containing a few LLaVA instructions). However, the result shows the descending performance even cannot interpret the "man behind the taxi" example, I am still figuring out the cause.
yeah, it seems it is unable to learn either the model gets overfit and underfit.
I am wondering how big of a difference is the domain shift? For example, for the extremely detailed anime captioning, I was actually surprised by what it can do with 100 examples: https://github.com/haotian-liu/LLaVA/issues/766#issuecomment-1800214174
I am wondering how big of a difference is the domain shift? For example, for the extremely detailed anime captioning, I was actually surprised by what it can do with 100 examples: #766 (comment)
@haotian-liu Here are two examples from my side, and the loss curve in 3 epochs:
The loss curve is very concerning here. Here is one of the LoRA finetuning loss curve on stable diffusion prompts.
The initial spike suggests that there is something wrong.
@Pro-xiaowen
Btw, just noticed this: [0.7677605, 0.815028, 0.8906875, 0.92288]
These coordinates seems overly accurate. You may just need three digits. The later digits may just cause the model to hallucinate.
The loss curve is very concerning here. Here is one of the LoRA finetuning loss curve on stable diffusion prompts.
The initial spike suggests that there is something wrong.
@haotian-liu Thx for your reply! May I ask how many samples are included in the dataset, I mean the extra LLaVA instruction samples and the total number of samples.
i think the epoch num is tooooo big, my model is also a little bit overfitting after full finetune in 20k data and 2 epoch(batch 4 ), and the loss was 0.67,
Is the 0.67 the overall loss you're referring to? It seems a bit high; typically, we aim for a loss close to 0 for a well-fit model. This value might suggest that the model is underfitting. Could you provide more context or details about the training process? It's important to assess whether this level of loss is acceptable for your specific use case.
thus llm generate more word beyond your answer, it not mean the answer is wrong. after experiment loss between 0.6~0.8 is normal. if you want model more accuracy, you may focus on improve size of dataset. here is my loss, and model is work well. ^_^, hope it can help you.
thus llm generate more word beyond your answer, it not mean the answer is wrong. after experiment loss between 0.6~0.8 is normal. if you want model more accuracy, you may focus on improve size of dataset. here is my loss, and model is work well. ^_^, hope it can help you.
@Linziyang1999 May I ask the number of samples included in your dataset (How many customized samples and original LLaVA samples)?
thus llm generate more word beyond your answer, it not mean the answer is wrong. after experiment loss between 0.6~0.8 is normal. if you want model more accuracy, you may focus on improve size of dataset. here is my loss, and model is work well. ^_^, hope it can help you.
@Linziyang1999 May I ask the number of samples included in your dataset (How many customized samples and original LLaVA samples)?
custom sample is 20k, and i found there will be an error raised if dataset only have image conversation during train so i add few conversation in mix665k without image(10 maybe? just make it work well).
@haotian-liu In my case, the loss seems to drop so quickly after only 30 steps, I have checked three things:
id
, image
, etc);Any obvious error that can be observed from my fine-tuning script or does anyone have any idea about what happened? B.T.W, I used 4 A100 (80GB).
deepspeed llava/train/train_mem.py \ --lora_enable True --lora_r 128 --lora_alpha 256 --mm_projector_lr 2e-5 \ --deepspeed ./scripts/zero3.json \ --model_name_or_path liuhaotian/llava-v1.5-13b \ --version v1 \ --data_path dataset_finetune/llava_finetune_task_v2.json \ --image_folder ./playground/data/images \ --vision_tower openai/clip-vit-large-patch14-336 \ --mm_projector_type mlp2x_gelu \ --mm_vision_select_layer -2 \ --mm_use_im_start_end False \ --mm_use_im_patch_token False \ --image_aspect_ratio pad \ --group_by_modality_length True \ --bf16 True \ --output_dir ./checkpoints/llava-v1.5-13b-task-lora-v2 \ --num_train_epochs 3 \ --per_device_train_batch_size 8 \ --per_device_eval_batch_size 2 \ --gradient_accumulation_steps 4 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 50000 \ --save_total_limit 10\ --learning_rate 2e-4 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 True \ --model_max_length 2048 \ --gradient_checkpointing True \ --dataloader_num_workers 2 \ --lazy_preprocess True \
i think the epoch num is tooooo big, my model is also a little bit overfitting after full finetune in 20k data and 2 epoch(batch 4 ), and the loss was 0.67,
we found the finetuned llava model is underfitting by seting the epoch as 1-10, even the prediction of train data is wrong! @Linziyang1999
we found the finetuned llava model is underfitting by seting the epoch as 1-10, even the prediction of train data is wrong!
so num_train_epochs 100 will make your loss smaller and the prediction more accurate? (I also encountered trouble, my fine-tuning didn't work) @Pro-xiaowen
Hi guys, I am on colab running it on A100 and trying to fine-tuned using the below code, facing error like ./checkpoint or train_men.py and other train.py files
My code
!git clone https://github.com/haotian-liu/LLaVA.git
%cd /content/LLaVA
!pip install -q gradio .
!bash /content/LLaVA/scripts/v1_5/finetune.sh```
Can you guys help me with the correct way to fine-tune it?
Gibberish output (even on train data) with wierd loss curve on fully finetuning. Can someone please help me fix this.
I am trying to fully finetune the entire text-only model Vicuna-v1.5 using my custom QnA data comprising of 160k qa pairs, using the same finetuning script as provided in finetune_task.sh by omitting the multimodal parameters. Here is the loss curve on 2.4 epochs. wandb report
Gibberish output (even on train data) with wierd loss curve on fully finetuning. Can someone please help me fix this.
I am trying to fully finetune the entire text-only model Vicuna-v1.5 using my custom QnA data comprising of 160k qa pairs, using the same finetuning script as provided in finetune_task.sh by omitting the multimodal parameters. Here is the loss curve on 2.4 epochs. wandb report
My loss function trend is in line with yours, and the fine-tuning is poorly done. Sad.
Hi guys, I am on colab running it on A100 and trying to fine-tuned using the below code, facing error like ./checkpoint or train_men.py and other train.py files
My code
!git clone https://github.com/haotian-liu/LLaVA.git %cd /content/LLaVA !pip install -q gradio . !bash /content/LLaVA/scripts/v1_5/finetune.sh``` Can you guys help me with the correct way to fine-tune it?
Hello @rohitpanjwani03 ,
Were you able to fine tune , I'm also trying to fine tune Was any thing missing in your finetuning process
I'm using replicate to fine tune the model https://replicate.com/ravi-teja-konda/llava_finetune/versions/58ea2fa644ef90a63c50bc608a532e2acd5792208978760164f3db900247f062
But as the replicate currently does not support changing the hyperparameters, looks I need to finetune on my own by running it in colab, or do we have any alternatives like in hugging face, or have any tried it ?
Just in case someone is having problems during inferece; if you have a script that uses /llava/eval/run_llava.py
as baseline for inference, you should be careful with the args. In my case I noticed that the run_llava.py
file will merge the LoRA weights if you specify a model-base
, and they were already merged, hence the poor performance on inference.
If the weights are merged:
model-path
should include the path to your fine-tuned model.model-base
should be None
.If the weights are not merged, you should also specify a model-base
to merge them.
model-path
should include the path to your fine-tuned model.model-base
should be your base model e.g. 'liuhaotian/llava-v1.5-7b'
.
Question
After finetuning using the my custorm data, the finetuned llava model is overfitting. In my experiments, I following the your instrcuction( cited in https://github.com/haotian-liu/LLaVA/blob/main/docs/Finetune_Custom_Data.md).
convert my data to the required format, as follows:
use the office scripts (cited in https://github.com/haotian-liu/LLaVA/blob/main/scripts/v1_5/finetune_task_lora.sh), as follows:
we find the finetuned llava model is underfitting by seting the epoch as 1-10, so we setting the epoch as 50-100, however the finetuned model is overfitting.
We find that the train loss=0 when the training is ending, and the performacen in test data is very poor.