[MiniLLM] sft of llama2-7b out of memory on V100

microsoft / LMOps

General technology for enabling AI capabilities w/ LLMs and MLLMs

https://aka.ms/GeneralAI

MIT License

3.71k stars 283 forks source link

[MiniLLM] sft of llama2-7b out of memory on V100 #160

Closed yumath closed 9 months ago

yumath commented 9 months ago

I use 1 node 4 V100（32G） to sft on llama2-7b, script: minillm/scripts/llama2/sft/sft_7B.sh, but got out of memory error. Should I using a more larger GPU? or I have something wrong? (because https://github.com/microsoft/LMOps/issues/91 is trained on V100)

t1101675 commented 9 months ago

Training llama2-7b requires at least 16x 32G V100 (#91 is trained with 16 32G V100)

yumath commented 9 months ago

Thanks for your reply!