jquesnelle / yarn

YaRN: Efficient Context Window Extension of Large Language Models
MIT License
1.32k stars 115 forks source link

deepspeed config crashed for `auto` and OOM #45

Closed tpoisonooo closed 10 months ago

tpoisonooo commented 10 months ago

1. 使用 deepspeed

配置文件 deepseed/zero3.json 报错,不能用 auto,自己改了 config, 也不知道对不对,先跑起来:

image

使用命令

accelerate launch finetune.py     --output-dir output/yarn-7b-32k    
--model NousResearch/Llama-2-7b-hf  --learning-rate 0.00001 
--lr-schedule constant --scaling-factor 8  --deepspeed

然后 OOM

2. 不用 deepspeed

accelerate config 取消掉 deepspeed 和 dynamo,默认 train.sh 第一个配置应该是 64k 长度的, OOM

# run `accelerate config` first. pass --deepspeed to finetune.py if using DeepSpeed

accelerate launch finetune.py \
    --output-dir output/yarn-7b-64k \
    --model NousResearch/Llama-2-7b-hf

3. 代码错误

DDP 是不是要改一下,头一回用,不知道对不对 image

疑问

看了下 x.shape 是 torch.Size([1, 65536, 4096]), 单卡 80G 显存似乎也不够。

所以是不是哪里应该设置个 tp ? 然而 README 对新人并不友好的样子 QAQ

WeiyaoLuo commented 10 months ago

I am in the same situation as you🥴, and the code seems to be unable to set model parallelism. I have 8 40G A100 that will always be OOM. I hope to solve this problem, how should I configure it? Otherwise, there's no way to finetune (I have tried using a smaller 8k dataset and a smaller LLama model, but still OOM) @tpoisonooo

tpoisonooo commented 10 months ago

@WeiyaoLuo Try enable wandb, magic ~

accelerate launch finetune.py     --output-dir output/yarn-7b-8k     --model NousResearch/Llama-2-7b-hf  --scaling-factor 2  --learning-rate 0.00001 --wandb  ${YOUR_PROJECT}  --dataset emozilla/yarn-train-tokenized-8k-llama    --deepspeed
YL-9 commented 5 months ago

你好,我用2x A100运行finetune.py,在2个gpu上都加载到了14g/80g,而后进行第一个batch运算,显存来到76g/80g,完成之后进行第二个batch就oom了,请问这个情况是正常的吗>﹏<