Load llama-2-70b model need too much CPU memory

JuiceLemonLemon commented 1 month ago

System Info

transformers version: 4.41.2
Platform: Linux-5.15.0-105-generic-x86_64-with-glibc2.29
Python version: 3.8.10
Huggingface_hub version: 0.23.4
Safetensors version: 0.4.3
Accelerate version: 0.31.0
Accelerate config: not found
PyTorch version (GPU?): 2.3.1+cu121 (True)
Tensorflow version (GPU?): 2.13.0 (True)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

No response

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Download Alpaca code. https://github.com/tatsu-lab/stanford_alpaca
Run the command to load Llama2-70b model torchrun --nproc_per_node=8 --master_port=29505 train.py --model_name_or_path ../models/Llama-2-70b-hf/ --data_path ./alpaca_data.json --bf16 True --output_dir ./output --num_train_epochs 1 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 1 --evaluation_strategy "no" --save_strategy "steps" --save_steps 2000 --save_total_limit 1 --learning_rate 1e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --logging_steps 1 --fsdp "full_shard auto_wrap" --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' --tf32 True --report_to none
The model cannot be loaded successfully, because the CPU memory has been used > 1T, then the server hangs up.

Expected behavior

It used too much CPU memory when loading Llama-2-70b model on 8 GPUs. How to fix this issue?

amyeroberts commented 1 month ago

Hi @JuiceLemonLemon, thanks for opening this issue!

Without knowing the GPUs your running on, it'd be hard to say what's reasonable in terms of CPU offloading utilization. Have you inspected with tools like nvidia-smi and top to see the memory usage and ensuring the model is loading as expected?

As the command comes from the https://github.com/tatsu-lab/stanford_alpaca repo, I'd suggest opening an issue on this repo, and they'll have more knowledge and experience with the expected behaviour and possible gotchas

github-actions[bot] commented 2 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / transformers