chatglm 分片模型不适合deepspeed

liangwq / Chatglm_lora_multi-gpu

chatglm多gpu用deepspeed和

404 stars 61 forks source link

chatglm 分片模型不适合deepspeed #20

Open kevinuserdd opened 1 year ago

kevinuserdd commented 1 year ago

试过deepspeed加载模型，多卡的时候的确会启动多个进程，但是显存其实不是成倍的。只有类似chatglm保存了分片检查点的情况，会出现成倍的显存增加，参考 https://github.com/microsoft/DeepSpeed/issues/2379，里面提出For example, gpt-neo-x 20B takes about 40GB in RAM, and if you run this script with deepspeed --num_gpus 4 example.py --save_ckpt, you will end up using 4 * 40GB in RAM

请问目前有什么办法解决这个问题吗？

liangwq commented 1 year ago

试过deepspeed加载模型，多卡的时候的确会启动多个进程，但是显存其实不是成倍的。只有类似chatglm保存了分片检查点的情况，会出现成倍的显存增加，参考 https://github.com/microsoft/DeepSpeed/issues/2379，里面提出For example, gpt-neo-x 20B takes about 40GB in RAM, and if you run this script with deepspeed --num_gpus 4 example.py --save_ckpt, you will end up using 4 * 40GB in RAM

请问目前有什么办法解决这个问题吗？

有办法解决的，不是chatgpt不适合分片，是写的有些问题

kevinuserdd commented 1 year ago

试过deepspeed加载模型，多卡的时候的确会启动多个进程，但是显存其实不是成倍的。只有类似chatglm保存了分片检查点的情况，会出现成倍的显存增加，参考 https://github.com/microsoft/DeepSpeed/issues/2379，里面提出For example, gpt-neo-x 20B takes about 40GB in RAM, and if you run this script with deepspeed --num_gpus 4 example.py --save_ckpt, you will end up using 4 * 40GB in RAM 请问目前有什么办法解决这个问题吗？

有办法解决的，不是chatgpt不适合分片，是写的有些问题

没说chatgpt不适合，是部分模型，比如我试了bloom、llama和chatglm。。。目前看chatglm在加载的时候会出现这个情况，就会出现多倍的显存

calebgithub commented 1 year ago

请问这个问题解决了吗？

liding1992 commented 1 year ago

试过deepspeed加载模型，多卡的时候的确会启动多个进程，但是显存其实不是成倍的。只有类似chatglm保存了分片检查点的情况，会出现成倍的显存增加，参考 https://github.com/microsoft/DeepSpeed/issues/2379，里面提出For example, gpt-neo-x 20B takes about 40GB in RAM, and if you run this script with deepspeed --num_gpus 4 example.py --save_ckpt, you will end up using 4 * 40GB in RAM 请问目前有什么办法解决这个问题吗？

有办法解决的，不是chatgpt不适合分片，是写的有些问题

没说chatgpt不适合，是部分模型，比如我试了bloom、llama和chatglm。。。目前看chatglm在加载的时候会出现这个情况，就会出现多倍的显存

大佬，哪些模型可以正常执行（正确输出结果、多卡GPU显存均匀分布且比单卡较少、多卡执行推理耗时比单卡少）？可否提供下模型名称与执行脚本？感谢