xverse-ai / XVERSE-13B

XVERSE-13B: A multilingual large language model developed by XVERSE Technology Inc.
Apache License 2.0
649 stars 58 forks source link

加载模型先吃内存,再换显存? #10

Closed lokvke closed 1 year ago

lokvke commented 1 year ago

使用 model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True).half().cuda(),内存会一直增加到50g左右,之后才开始加载到显存。感觉加载了两次模型,第一次调用了cpu,第二次才放到显存上?

换成:model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, device_map='auto', trust_remote_code=True).eval() 就正常了。

underspirit commented 1 year ago

是的, 前面的写法是默认使用float32加载到内存, 再转为float16, 最后再放入显存. 你的写法会进行分片文件加载, 并放入显存, 所以内存占用更小.