deepspeed config crashed for `auto` and OOM

tpoisonooo commented 10 months ago

1. 使用 deepspeed

配置文件 deepseed/zero3.json 报错，不能用 auto，自己改了 config，也不知道对不对，先跑起来：

使用命令

accelerate launch finetune.py     --output-dir output/yarn-7b-32k    
--model NousResearch/Llama-2-7b-hf  --learning-rate 0.00001 
--lr-schedule constant --scaling-factor 8  --deepspeed

然后 OOM

2. 不用 deepspeed

accelerate config 取消掉 deepspeed 和 dynamo，默认 train.sh 第一个配置应该是 64k 长度的， OOM

# run `accelerate config` first. pass --deepspeed to finetune.py if using DeepSpeed

accelerate launch finetune.py \
    --output-dir output/yarn-7b-64k \
    --model NousResearch/Llama-2-7b-hf

3. 代码错误

DDP 是不是要改一下，头一回用，不知道对不对

疑问

看了下 x.shape 是 torch.Size([1, 65536, 4096]), 单卡 80G 显存似乎也不够。

所以是不是哪里应该设置个 tp ? 然而 README 对新人并不友好的样子 QAQ

WeiyaoLuo commented 10 months ago

I am in the same situation as you🥴, and the code seems to be unable to set model parallelism. I have 8 40G A100 that will always be OOM. I hope to solve this problem, how should I configure it? Otherwise, there's no way to finetune （I have tried using a smaller 8k dataset and a smaller LLama model, but still OOM） @tpoisonooo

tpoisonooo commented 10 months ago

@WeiyaoLuo Try enable wandb, magic ~

accelerate launch finetune.py     --output-dir output/yarn-7b-8k     --model NousResearch/Llama-2-7b-hf  --scaling-factor 2  --learning-rate 0.00001 --wandb  ${YOUR_PROJECT}  --dataset emozilla/yarn-train-tokenized-8k-llama    --deepspeed

YL-9 commented 5 months ago

你好，我用2x A100运行finetune.py，在2个gpu上都加载到了14g/80g，而后进行第一个batch运算，显存来到76g/80g，完成之后进行第二个batch就oom了，请问这个情况是正常的吗＞﹏＜

jquesnelle / yarn