hiyouga / LLaMA-Factory

Unify Efficient Fine-Tuning of 100+ LLMs
Apache License 2.0
25.26k stars 3.13k forks source link

Memory Error during tokenization while fine tuning LLava1.5-7B-Chat more than 8000 images #4450

Closed Hassaan68 closed 4 days ago

Hassaan68 commented 4 days ago

Reminder

System Info

I am using 8 GPUs to fine-tune LLava1.5-7B-Chat on more than 8000 images, but the tokenizer tries to run tokenization on all 8000 images at once, causing a memory error. 8300 is the maximum number of images I am able to train on

Reproduction

Finetune LLava on more than 8000 images

Expected behavior

There should be a distributed way to tokenize and load the images one by one.

Others

No response

hiyouga commented 4 days ago

Try dataset streaming: streaming: true

Hassaan68 commented 2 days ago

@hiyouga I am still facing the issue with streaming:true and max_steps:10000. I am finetuning LLava on 93000 images and tokenizer just report No Space left on device error after tokenizing around 52000 images. I can see that my sagemaker cache is 75GB after this making the space memory full. how to counter this issue?

Full Command:

llamafactory-cli train \
    --stage sft \
    --do_train True \
    --model_name_or_path llava-hf/llava-1.5-7b-hf \
    --preprocessing_num_workers 16 \
    --finetuning_type lora \
    --template vicuna \
    --flash_attn fa2 \
    --visual_inputs True \
    --dataset_dir data \
    --dataset icentia11k \
    --cutoff_len 1024 \
    --learning_rate 5e-05 \
    --num_train_epochs 10.0 \
    --max_steps 10000 \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --max_grad_norm 1.0 \
    --logging_steps 5 \
    --save_steps 100 \
    --warmup_steps 0 \
    --optim adamw_torch \
    --packing False \
    --report_to none \
    --output_dir saves/LLaVA1.5-7B-Chat/lora/train_2024-06-26-11-09-00 \
    --fp16 True \
    --plot_loss True \
    --ddp_timeout 180000000 \
    --include_num_input_tokens_seen True \
    --lora_rank 8 \
    --lora_alpha 32 \
    --lora_dropout 0 \
    --use_dora True \
    --lora_target all  \ 
    --streaming True