Closed sammcj closed 3 months ago
New feature: vRAM estimator!
To estimate VRAM usage:
gollama --vram --model NousResearch/Hermes-2-Theta-Llama-3-8B --quant q4_k_m --context 2048 --kvcache q4_0 # For GGUF models gollama --vram --model NousResearch/Hermes-2-Theta-Llama-3-8B --quant 5.0 --context 2048 --kvcache q4_0 # For exl2 models # Estimated VRAM usage: 5.35 GB
To calculate maximum context for a given memory constraint:
gollama --vram --model NousResearch/Hermes-2-Theta-Llama-3-8B --quant q4_k_m --memory 6 --kvcache q8_0 # For GGUF models gollama --vram --model NousResearch/Hermes-2-Theta-Llama-3-8B --bpw 5.0 --memory 6 --kvcache q8_0 # For exl2 models # Maximum context for 6.00 GB of memory: 5069
To find the best BPW:
gollama --vram --model NousResearch/Hermes-2-Theta-Llama-3-8B --memory 6 --quanttype gguf # Best BPW for 6.00 GB of memory: IQ3_S
The vRAM estimator works by:
New feature: vRAM estimator!
To estimate VRAM usage:
To calculate maximum context for a given memory constraint:
To find the best BPW:
The vRAM estimator works by: