mzbac / AutoGPTQ-API

Host the GPTQ model using AutoGPTQ as an API that is compatible with text generation UI API.
91 stars 14 forks source link

Requirements and Speed #7

Open LoggeL opened 11 months ago

LoggeL commented 11 months ago

Very interesting project 😄 Thanks for doing Do you have a rough estimation on the hardware required and the speed that you get with certain hardware?

Edit: Found the RAM requirements but don't have the hardware to judge the speed. https://huggingface.co/TheBloke/WizardCoder-15B-1.0-GGML#provided-files

mzbac commented 11 months ago

I use the 4090 for it. Speed-wise, I think you're better off using extllama or exllama2, which have the best speed with GPU acceleration. https://github.com/turboderp/exllama https://github.com/turboderp/exllamav2