Closed anphex closed 1 year ago
I'm encountering the same problem on Ubuntu 20 with a 3080 Ti and a 1060 6GB.
ok, I'm dumb and didn't use thousands for MB
I found the issue. It works if you specify the final layer that will be loaded to each GPU instead of the number of layers to load onto each GPU (from llama_inference_offload.py in GPTQ-for-LLaMa). I wanted to load 35 layers onto cuda:0 and 5 onto cuda:1, so the correct argument is --pre-layer 35 40
and not --pre-layer 35 5
.
Ah, my bad. Missed the Auto- part. My issue was that I was confused by this and assumed that "the numbers" referred to "the number of layers" (as for memory in --gpu-split
) and not e.g. "the numbers that describe the layer distribution".
Through some help on TheBloke Discord we discovered that setting a rather large page file when you have low ram (32GB for example) so the page file can be used as buffer while loading the model into VRAM.
When using ExLlama(-HF) I get to load all 70b-4b models cleanly now but the split especially in AutoGPTQ doesn't seem to work. (While "work" being by expectation that the loader finds all specs by itself in sets the optimal split).
I can only emphasize, ExLlama does it job perfectly when using a 17,23 split on two 3090s. Didn't try GPTQ-for-Llama yet.
GPTQ-for-llama needs that pre-layer otherwise it will be slow. I don't think it supports the new attention in the 70b tho.
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.
Describe the bug
When using a small 3-8 bit model that could fit into a single GPU (3090) anyway, there are no issues and Windows Task Manager also shows that both models receive "something" in their VRAM in about equal amounts.
WHen using a bigger model where it has to be split in two similar GPUs, TGI just falls back into loading it onto CPU and mempry while failing obviously. I tried about every setting: auto-devices, different splits, different flags - to no avail.
Is there an existing issue for this?
Reproduction
Use latest TGI version via the .bat-updater on August 6th 2023
Screenshot
Logs
System Info