microsoft / LMOps

General technology for enabling AI capabilities w/ LLMs and MLLMs
https://aka.ms/GeneralAI
MIT License
3.42k stars 255 forks source link

[MiniLLM] sft of llama2-7b out of memory on V100 #160

Closed yumath closed 5 months ago

yumath commented 5 months ago

I use 1 node 4 V100(32G) to sft on llama2-7b, script: minillm/scripts/llama2/sft/sft_7B.sh, but got out of memory error. Should I using a more larger GPU? or I have something wrong? (because https://github.com/microsoft/LMOps/issues/91 is trained on V100)

t1101675 commented 5 months ago

Training llama2-7b requires at least 16x 32G V100 (#91 is trained with 16 32G V100)

yumath commented 5 months ago

Thanks for your reply!