Open lingyezhixing opened 1 month ago
Same situation here! This issue makes me have to stay on 0.2.1.
Same situation here! This issue makes me have to stay on 0.2.1.
I am also stuck in version 0.2.1
Same here. WSL2 + NVIDIA GPU
Is there any update? With 0.3.0 I am still on:
offloading 79 repeating layers to GPU
llm_load_tensors: offloaded 79/81 layers to GPU
for qwen2:
llm_load_print_meta: model type = 70B
llm_load_print_meta: model ftype = Q3_K - Large
llm_load_print_meta: model params = 72.71 B
llm_load_print_meta: model size = 36.79 GiB (4.35 BPW)
llm_load_print_meta: general.name = Qwen2-72B-Instruct
for 0.2.1 I could load all in vram.
I added "num_gpu":81 to the params file of the model and now it loads all of it!
What is the issue?
My graphics card is a 4060 laptop model, with only 8GB of VRAM. Interestingly, even before the update, none of the models was actually utilizing the full capacity of my GPU memory.
OS
Windows
GPU
Nvidia
CPU
AMD
Ollama version
0.2.3