🐞 Bug: Error calculating VRAM: bad status: 401 Unauthorized

halbtuerke commented 1 month ago

Description

Tried using the vram estimator but getting a 401 Unauthorized error message.

How to reproduce

gollama --vram --model gemma2:2b-instruct-q8_0 --quant q8_0 --context 8096

Output

Error calculating VRAM: bad status: 401 Unauthorized

Environment

OS and version: macOS 14.6
Install source: go install github.com/sammcj/gollama@HEAD
Go version: go version go1.22.5 darwin/arm64

Can you contribute?

No

sammcj commented 1 month ago

Thanks! Working on a fix now

sammcj commented 1 month ago

New version should be available shortly that both supports huggingface/id and ollama:model

Let me know / re-open if it doesn't work for you!

go install github.com/sammcj/gollama@v1.26.0

sammcj commented 1 month ago

gl --vram llama3.1:8b-instruct-q6_K --fits 14
📊 VRAM Estimation for Model: llama3.1:8b-instruct-q6_K

| QUANT|CTX | BPW  | 2K  | 8K  |       16K       |       32K       |       49K       |       64K       |
|-----------|------|-----|-----|-----------------|-----------------|-----------------|-----------------|
| IQ1_S     | 1.56 | 2.2 | 2.8 | 3.7(3.7,3.7)    | 5.5(5.5,5.5)    | 7.3(7.3,7.3)    | 9.1(9.1,9.1)    |
| IQ2_XXS   | 2.06 | 2.6 | 3.3 | 4.3(4.3,4.3)    | 6.1(6.1,6.1)    | 7.9(7.9,7.9)    | 9.8(9.8,9.8)    |
| IQ2_XS    | 2.31 | 2.9 | 3.6 | 4.5(4.5,4.5)    | 6.4(6.4,6.4)    | 8.2(8.2,8.2)    | 10.1(10.1,10.1) |
| IQ2_S     | 2.50 | 3.1 | 3.8 | 4.7(4.7,4.7)    | 6.6(6.6,6.6)    | 8.5(8.5,8.5)    | 10.4(10.4,10.4) |
| IQ2_M     | 2.70 | 3.2 | 4.0 | 4.9(4.9,4.9)    | 6.8(6.8,6.8)    | 8.7(8.7,8.7)    | 10.6(10.6,10.6) |
| IQ3_XXS   | 3.06 | 3.6 | 4.3 | 5.3(5.3,5.3)    | 7.2(7.2,7.2)    | 9.2(9.2,9.2)    | 11.1(11.1,11.1) |
| IQ3_XS    | 3.30 | 3.8 | 4.5 | 5.5(5.5,5.5)    | 7.5(7.5,7.5)    | 9.5(9.5,9.5)    | 11.4(11.4,11.4) |
| Q2_K      | 3.35 | 3.9 | 4.6 | 5.6(5.6,5.6)    | 7.6(7.6,7.6)    | 9.5(9.5,9.5)    | 11.5(11.5,11.5) |
| Q3_K_S    | 3.50 | 4.0 | 4.8 | 5.7(5.7,5.7)    | 7.7(7.7,7.7)    | 9.7(9.7,9.7)    | 11.7(11.7,11.7) |
| IQ3_S     | 3.50 | 4.0 | 4.8 | 5.7(5.7,5.7)    | 7.7(7.7,7.7)    | 9.7(9.7,9.7)    | 11.7(11.7,11.7) |
| IQ3_M     | 3.70 | 4.2 | 5.0 | 6.0(6.0,6.0)    | 8.0(8.0,8.0)    | 9.9(9.9,9.9)    | 12.0(12.0,12.0) |
| Q3_K_M    | 3.91 | 4.4 | 5.2 | 6.2(6.2,6.2)    | 8.2(8.2,8.2)    | 10.2(10.2,10.2) | 12.2(12.2,12.2) |
| IQ4_XS    | 4.25 | 4.7 | 5.5 | 6.5(6.5,6.5)    | 8.6(8.6,8.6)    | 10.6(10.6,10.6) | 12.7(12.7,12.7) |
| Q3_K_L    | 4.27 | 4.7 | 5.5 | 6.5(6.5,6.5)    | 8.6(8.6,8.6)    | 10.7(10.7,10.7) | 12.7(12.7,12.7) |
| IQ4_NL    | 4.50 | 5.0 | 5.7 | 6.8(6.8,6.8)    | 8.9(8.9,8.9)    | 10.9(10.9,10.9) | 13.0(13.0,13.0) |
| Q4_0      | 4.55 | 5.0 | 5.8 | 6.8(6.8,6.8)    | 8.9(8.9,8.9)    | 11.0(11.0,11.0) | 13.1(13.1,13.1) |
| Q4_K_S    | 4.58 | 5.0 | 5.8 | 6.9(6.9,6.9)    | 8.9(8.9,8.9)    | 11.0(11.0,11.0) | 13.1(13.1,13.1) |
| Q4_K_M    | 4.85 | 5.3 | 6.1 | 7.1(7.1,7.1)    | 9.2(9.2,9.2)    | 11.4(11.4,11.4) | 13.5(13.5,13.5) |
| Q4_K_L    | 4.90 | 5.3 | 6.1 | 7.2(7.2,7.2)    | 9.3(9.3,9.3)    | 11.4(11.4,11.4) | 13.6(13.6,13.6) |
| Q5_0      | 5.54 | 5.9 | 6.8 | 7.8(7.8,7.8)    | 10.0(10.0,10.0) | 12.2(12.2,12.2) | 14.4(14.4,14.4) |
| Q5_K_S    | 5.54 | 5.9 | 6.8 | 7.8(7.8,7.8)    | 10.0(10.0,10.0) | 12.2(12.2,12.2) | 14.4(14.4,14.4) |
| Q5_K_M    | 5.69 | 6.1 | 6.9 | 8.0(8.0,8.0)    | 10.2(10.2,10.2) | 12.4(12.4,12.4) | 14.6(14.6,14.6) |
| Q5_K_L    | 5.75 | 6.1 | 7.0 | 8.1(8.1,8.1)    | 10.3(10.3,10.3) | 12.5(12.5,12.5) | 14.7(14.7,14.7) |
| Q6_K      | 6.59 | 7.0 | 8.0 | 9.4(9.4,9.4)    | 12.2(12.2,12.2) | 15.0(15.0,15.0) | 17.8(17.8,17.8) |
| Q8_0      | 8.50 | 8.8 | 9.9 | 11.4(11.4,11.4) | 14.4(14.4,14.4) | 17.4(17.4,17.4) | 20.3(20.3,20.3) |

and

gl --vram NousResearch/Hermes-2-Theta-Llama-3-8B --fits 20
📊 VRAM Estimation for Model: NousResearch/Hermes-2-Theta-Llama-3-8B

| QUANT|CTX | BPW  | 2K  |  8K  |       16K       |       32K       |       49K       |       64K       |
|-----------|------|-----|------|-----------------|-----------------|-----------------|-----------------|
| IQ1_S     | 1.56 | 2.4 | 3.8  | 5.7(4.7,4.2)    | 9.5(7.5,6.5)    | 13.3(10.3,8.8)  | 17.1(13.1,11.1) |
| IQ2_XXS   | 2.06 | 2.9 | 4.3  | 6.3(5.3,4.8)    | 10.1(8.1,7.1)   | 13.9(10.9,9.4)  | 17.8(13.8,11.8) |
| IQ2_XS    | 2.31 | 3.1 | 4.6  | 6.5(5.5,5.0)    | 10.4(8.4,7.4)   | 14.2(11.2,9.8)  | 18.1(14.1,12.1) |
| IQ2_S     | 2.50 | 3.3 | 4.8  | 6.7(5.7,5.2)    | 10.6(8.6,7.6)   | 14.5(11.5,10.0) | 18.4(14.4,12.4) |
| IQ2_M     | 2.70 | 3.5 | 5.0  | 6.9(5.9,5.4)    | 10.8(8.8,7.8)   | 14.7(11.7,10.2) | 18.6(14.6,12.6) |
| IQ3_XXS   | 3.06 | 3.8 | 5.3  | 7.3(6.3,5.8)    | 11.2(9.2,8.2)   | 15.2(12.2,10.7) | 19.1(15.1,13.1) |
| IQ3_XS    | 3.30 | 4.1 | 5.5  | 7.5(6.5,6.0)    | 11.5(9.5,8.5)   | 15.5(12.5,11.0) | 19.4(15.4,13.4) |
| Q2_K      | 3.35 | 4.1 | 5.6  | 7.6(6.6,6.1)    | 11.6(9.6,8.6)   | 15.5(12.5,11.0) | 19.5(15.5,13.5) |
| IQ3_S     | 3.50 | 4.3 | 5.8  | 7.7(6.7,6.2)    | 11.7(9.7,8.7)   | 15.7(12.7,11.2) | 19.7(15.7,13.7) |
| Q3_K_S    | 3.50 | 4.3 | 5.8  | 7.7(6.7,6.2)    | 11.7(9.7,8.7)   | 15.7(12.7,11.2) | 19.7(15.7,13.7) |
| IQ3_M     | 3.70 | 4.5 | 6.0  | 8.0(7.0,6.5)    | 11.9(9.9,8.9)   | 15.9(12.9,11.4) | 20.0(16.0,14.0) |
| Q3_K_M    | 3.91 | 4.7 | 6.2  | 8.2(7.2,6.7)    | 12.2(10.2,9.2)  | 16.2(13.2,11.7) | 20.2(16.2,14.2) |
| IQ4_XS    | 4.25 | 5.0 | 6.5  | 8.5(7.5,7.0)    | 12.6(10.6,9.6)  | 16.6(13.6,12.1) | 20.7(16.7,14.7) |
| Q3_K_L    | 4.27 | 5.0 | 6.5  | 8.5(7.5,7.0)    | 12.6(10.6,9.6)  | 16.6(13.7,12.2) | 20.7(16.7,14.7) |
| IQ4_NL    | 4.50 | 5.2 | 6.7  | 8.8(7.8,7.3)    | 12.9(10.9,9.9)  | 16.9(13.9,12.4) | 21.0(17.0,15.0) |
| Q4_0      | 4.55 | 5.2 | 6.8  | 8.8(7.8,7.3)    | 12.9(10.9,9.9)  | 17.0(14.0,12.5) | 21.1(17.1,15.1) |
| Q4_K_S    | 4.58 | 5.3 | 6.8  | 8.9(7.9,7.4)    | 12.9(10.9,9.9)  | 17.0(14.0,12.5) | 21.1(17.1,15.1) |
| Q4_K_M    | 4.85 | 5.5 | 7.1  | 9.1(8.1,7.6)    | 13.2(11.2,10.2) | 17.4(14.4,12.9) | 21.5(17.5,15.5) |
| Q4_K_L    | 4.90 | 5.6 | 7.1  | 9.2(8.2,7.7)    | 13.3(11.3,10.3) | 17.4(14.4,12.9) | 21.6(17.6,15.6) |
| Q5_K_S    | 5.54 | 6.2 | 7.8  | 9.8(8.8,8.3)    | 14.0(12.0,11.0) | 18.2(15.2,13.7) | 22.4(18.4,16.4) |
| Q5_0      | 5.54 | 6.2 | 7.8  | 9.8(8.8,8.3)    | 14.0(12.0,11.0) | 18.2(15.2,13.7) | 22.4(18.4,16.4) |
| Q5_K_M    | 5.69 | 6.3 | 7.9  | 10.0(9.0,8.5)   | 14.2(12.2,11.2) | 18.4(15.4,13.9) | 22.6(18.6,16.6) |
| Q5_K_L    | 5.75 | 6.4 | 8.0  | 10.1(9.1,8.6)   | 14.3(12.3,11.3) | 18.5(15.5,14.0) | 22.7(18.7,16.7) |
| Q6_K      | 6.59 | 7.2 | 9.0  | 11.4(10.4,9.9)  | 16.2(14.2,13.2) | 21.0(18.0,16.5) | 25.8(21.8,19.8) |
| Q8_0      | 8.50 | 9.1 | 10.9 | 13.4(12.4,11.9) | 18.4(16.4,15.4) | 23.4(20.4,18.9) | 28.3(24.3,22.3) |

halbtuerke commented 1 month ago

Thank you!

sammcj / gollama