turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
MIT License
2.74k stars 215 forks source link

[Feature Request] OpenAI-compatible API #212

Closed langchain4j closed 1 year ago

langchain4j commented 1 year ago

Hello and thank you so much for this great project!

Could you please add an API compatible with OpenAI? Lots of existing tools are using OpenAI as a LLM provider and it will be very easy for them to switch to local models hosted wit exllama if there were an API compatible with OpenAI.

BTW, there is a very popular LocalAI project which provides OpenAI-compatible API, but their inference speed is not as good as exllama...

Thank you for consideration!

dspasyuk commented 1 year ago

+1 this one too. You can find FastAPI already in issues which work well but it has dependency on Rust nightmares and as far as I know does not use OpenAI syntax.

SinanAkkoyun commented 1 year ago

I am working on something similar, will eventually do a PR but it can take a long time

SinanAkkoyun commented 1 year ago

Is the main thing you are looking for in OpenAI compatible API the streaming response? I might PR a non OpenAI API for this repo sooner, you could just write a small adapter and modify your code to work with that

langchain4j commented 1 year ago

@SinanAkkoyun I am looking for a full compatibility

arbi-dev commented 1 year ago

openai compatible api for exllama (and other loaders) are already available via ooba's text-generation-webui. you can use all your existing openai scripts just setting openai.api_base = "http://0.0.0.0:5001/v1" assuming you run the model locally and have checked the "openai" box in the text-generation-webui interface.

turboderp commented 1 year ago

I've added a "consider an OpenAI compatible server" to the roadmap for V2, but for now I think adding more complexity to V1 is a waste of effort, especially when text-generation-webui already has what seems to be a very complete implementation that can use ExLlama as a backend.

SinanAkkoyun commented 1 year ago

but for now I think adding more complexity to V1 is a waste of effort

Will V2 be a new repo or will V1 stay as an old branch and V2 be the new master branch in the future?

turboderp commented 1 year ago

New repo, I think. There's basically nothing of the original code in it.

SinanAkkoyun commented 1 year ago

Okay :)

lhl commented 1 year ago

Just as an FYI for anyone searching, I haven't tested it out but this project is pretty active and claims Exllama support: https://github.com/c0sogi/llama-api

ehartford commented 11 months ago

I was really expecting something similar to llama-cpp-python's feature that stands up an API

https://github.com/abetlen/llama-cpp-python#web-server

https://github.com/abetlen/llama-cpp-python/blob/main/llama_cpp/server/app.py