tatsu-lab / stanford_alpaca

Code and documentation to train Stanford's Alpaca models, and generate the data.
https://crfm.stanford.edu/2023/03/13/alpaca.html
Apache License 2.0
29.55k stars 4.06k forks source link

torch.cuda.OutOfMemoryError: CUDA out of memory. #189

Open Ahtesham00 opened 1 year ago

Ahtesham00 commented 1 year ago

Getting this error while using single A100 8G0GB while loading llama-7b

I tried reducing the batch size also changes the --gradient_accumulation_steps but not able to work it out.

I was able to run this in one condition when I used model().cuda().half() but when I tested the saved model it outputted something like this "?? ?? ?? ?? ?? ??" as a result except of generating text.

Error

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 774.00 MiB (GPU 0; 80.00 GiB total capacity; 71.96 GiB already allocated; 791.50 MiB free; 72.18 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I am using following command to execute the script. torchrun --nproc_per_node=1 --master_port=5050 train7b.py --model_name_or_path ./7bWeights/llama-7b --data_path ./alpaca_data_few.json --bf16 True --output_dir ./Dmodel7b --num_train_epochs 3 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 8 --evaluation_strategy "no" --save_strategy "steps" --save_steps 2000 --save_total_limit 1 --learning_rate 2e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --logging_steps 1 --fsdp "full_shard auto_wrap" --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' --tf32 True

image

testing saved model results

image

tinylamb commented 1 year ago

Getting this error while using single A100 8G0GB while loading llama-7b

I tried reducing the batch size also changes the --gradient_accumulation_steps but not able to work it out.

I was able to run this in one condition when I used model().cuda().half() but when I tested the saved model it outputted something like this "?? ?? ?? ?? ?? ??" as a result except of generating text.

Error

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 774.00 MiB (GPU 0; 80.00 GiB total capacity; 71.96 GiB already allocated; 791.50 MiB free; 72.18 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I am using following command to execute the script. torchrun --nproc_per_node=1 --master_port=5050 train7b.py --model_name_or_path ./7bWeights/llama-7b --data_path ./alpaca_data_few.json --bf16 True --output_dir ./Dmodel7b --num_train_epochs 3 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 8 --evaluation_strategy "no" --save_strategy "steps" --save_steps 2000 --save_total_limit 1 --learning_rate 2e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --logging_steps 1 --fsdp "full_shard auto_wrap" --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' --tf32 True

image

testing saved model results

image

do you solved this error while using single A100 8G0GB?

Ahtesham00 commented 1 year ago

No I have not yet. I did not find any solution. Now I am looking to have more GPUs to train it.

regarding usage of saved model. seams like i am getting corrupted saved model