jquesnelle / yarn

YaRN: Efficient Context Window Extension of Large Language Models

MIT License

1.32k stars 115 forks source link

OOM on two 80GB GPUs #49

Open kyleliang919 opened 8 months ago

kyleliang919 commented 8 months ago

accelerate launch finetune.py \
    --output-dir output/mistral-yarn-7b-64k \
    --model mistralai/Mistral-7B-v0.1 \
    --architecture mistral \
    --scaling-factor 2 \
    --max-position-embeddings 16384 \
    --dataset emozilla/yarn-train-tokenized-8k-mistral \
    --sliding-window-attention-schedule 4096 \
    --lr-schedule constant \
    --learning-rate 0.000001 \
    --max-train-steps 1000

Both with or without lora hits the OOM error, this is on only 8K sequence length, so memory consumption should be around 4x smaller compared with training on 16K sequence length.

accelerate is configured to use two GPU and FSDP.

edisonzf2020 commented 7 months ago

TracyPlus commented 5 months ago

YL-9 commented 4 months ago

accelerate launch finetune.py \
    --output-dir output/mistral-yarn-7b-64k \
    --model mistralai/Mistral-7B-v0.1 \
    --architecture mistral \
    --scaling-factor 2 \
    --max-position-embeddings 16384 \
    --dataset emozilla/yarn-train-tokenized-8k-mistral \
    --sliding-window-attention-schedule 4096 \
    --lr-schedule constant \
    --learning-rate 0.000001 \
    --max-train-steps 1000
Both with or without lora hits the OOM error, this is on only 8K sequence length, so memory consumption should be around 4x smaller compared with training on 16K sequence length.

accelerate is configured to use two GPU and FSDP.

I also encountered this problem, have you solved it now? @kyleliang919 @edisonzf2020

kyleliang919 commented 4 months ago

unfortunately no, I think you probably need at least 320 GB to handle this run.

YL-9 commented 4 months ago

unfortunately no, I think you probably need at least 320 GB to handle this run.

thank you for your reply. I have 4xA100, but there is a process on each GPU, so it's still OOM, like 2xA100, I don't know how to configure. QAQ

disperaller commented 2 months ago

unfortunately no, I think you probably need at least 320 GB to handle this run.

I used 8 A800 (80g) GPU to run the following, however it still encounters OOM error. Am i missing something or did i set something incorrectly?

accelerate launch finetune.py \ --output-dir output/yarn-mistral-7b-64k \ --model MODEL/Mistral/7b_01 \ --architecture mistral \ --scaling-factor 8 \ --max-position-embeddings 16384 \ --dataset data/emozilla____yarn-train-tokenized-16k-mistral \ --sliding-window-attention-schedule 65536 \ --lr-schedule constant \ --learning-rate 0.000001 \ --max-train-steps 1000 \ --gradient-accumulate-every 2 \ --deepspeed \ --batch-size 1

--wandb mistral_7b_yarn_64k