turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.52k stars 271 forks source link

[feature request] LLAMA.CPP #476

Closed 0wwafa closed 4 months ago

0wwafa commented 4 months ago

It would be great to have this interface locally to use models on CPU with llama.cpp

turboderp commented 4 months ago

I'm not really sure how to process that. Are you talking about ExUI?

0wwafa commented 4 months ago

Yes, sorry.. I realize now I posted it in the wrong repository.

turboderp commented 4 months ago

It's fine.

But ExUI is really closely integrated with ExLlamaV2. I do plan to change it over to using tabbyAPI as a backend, which would mean it could also connect to other OAI-compatible servers, including ones running llama.cpp, or even ChatGPT. But I don't think I'll have time for that for at least a couple of months.