Open jagilley opened 1 year ago
Should be pretty doable. This model would run nicely on a T4 or equivalent hardware: https://huggingface.co/TheBloke/Llama-2-13B-GGUF
May not actually be useful given that quantization often works against batching
Should be pretty doable. This model would run nicely on a T4 or equivalent hardware: https://huggingface.co/TheBloke/Llama-2-13B-GGUF