tloen / llama-int8

Quantized inference code for LLaMA models
GNU General Public License v3.0
1.05k stars 105 forks source link

When a single A100 80G ,memory is about 96G,Error loading 65B #8

Open dpyneo opened 1 year ago

dpyneo commented 1 year ago

First of all, thank you very much for this great project。When a single A100 80G loads the fifth weight file, it will prompt that the process is killed(signal). Is this because of insufficient memory? My memory is about 96G。

tloen commented 1 year ago

That seems likely. Have you tried increasing the size of your swapfile?

ChaoChungWu-Johnson commented 1 year ago

I'm also having the troubles of insufficient cpu ram as well. Would you mind clarifying how much CPU ram is required when using in8 version for like 13B, 33B and 65B llama? I want to adjust hardware spec plan according to your advice. Thanks!

dpyneo commented 1 year ago

Sorry, the following is the time to load 13B (653s), 30B (1672s) and 65B (4242s). Because I use the weights placed on the mobile hard disk, it may take a little longer. And I found that after loading the two weight files of 13B, and then loading the four weight files of 30B, it will directly jump to the third weight file before starting. The first two weight files are directly loaded, and the memory required for loading 30B is about 96G, The video memory is about 37G. After increasing the swap to 120G, 30B can be loaded without being killed. When loading 65B, the memory needs about 96G+50G (swap), while the video memory needs about 70G 13B 30B 65B