Open SoftologyPro opened 4 months ago
I have a bunch of EXL2 models on HF here. Other than that you can simply search for "EXL2" or "GPTQ" on HF and you should get lots of results. The FP16 format is just the standard HF format. Supported architectures should load without any quantization if you have enough VRAM.
Which model to choose depends what you're trying to do. Gemma2-27B-it is a very strong model that should work well on 24 GB GPU at about 4-5 bpw.
I see this in the readme
Supports EXL2, GPTQ and FP16 models
but no links to the models themselves? Can you give me the HF URLs for those recommended models? Or the models you think are "best" for use with ExUI (with a 24 GB VRAM GPU)? Thanks.