Rest API for inference locally

h2oai / h2ogpt

Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://codellama.h2o.ai/

http://h2o.ai

Apache License 2.0

10.94k stars 1.2k forks source link

Rest API for inference locally #1563

Open mohamed-alired opened 2 months ago

mohamed-alired commented 2 months ago

hi I have installed h2ogpt locally, but I want to build a frontend app using it, so I was wondering if there's an API that I can consume, like one for ingestion and another for inference.

pseudotensor commented 2 months ago

An extensive gradio API exists, see: See readme_client.md and examples via test code like test_client_chat_stream_langchain_steps3

And a full chat OpenAI API that is REST capable exists, but no upload of file or other things exists yet. Is that what you are looking for?

mohamed-alired commented 2 months ago

What I am looking for is a fastapi rest API for the different ingestion techniques and a rag completion API so I can use H2OGPT as a backend rag for my frontend webUI. Also, I wish you included JSON metadata for filtering in ingestion and rag completion so we can choose the files to chat with.

abuyusif01 commented 2 months ago

hi @mohamed-alired

Am currently building something exactly like this. its still in development tho. U can certainly fork the repo or make PR's. the foundation is there. The project extends the official FastAPI Template so scalling and deploying wont really much of a husle.

check it out here: https://github.com/abuyusif01/h2ogpt-fast-api/tree/main/backend/app/h2ogpt

there's still alot things need to be done. Including a proper README and support Streaming the Response (I planed to get this done in this weekend)

Here is what we currently support:

Chat with on disk files (there's an endpoint to upload docs, and retrieve whats being uploaded, so u can select which doc to ingest)
Chat with user Created pipelines (Currently MongoDB streamed data)
Chat with Urls
Chat with Publications, We use OpenDoaj API and scihub to download the papers.

mohamed-alired commented 2 months ago

hi @abuyusif01 how are you? i am really busy so if i have some time i will definitely PR bit i can give you some recommendations like don't force the inference with users cause i may wanna use it on my existing project also i think you have to make it possible with local inference like llamaCpp or something else so it's completely locally

abuyusif01 commented 2 months ago

@mohamed-alired You're right we dont really need to enforce auth, hence its removal I also make it possible to local inference using llamaCPP.

Subsequently, i restructure the repo, write a readme and containerize the app. Its now easy to setup + extend check it here: https://github.com/abuyusif01/h2ogpt-fast-api

@pseudotensor Since gradio is relatively stable now, why not reference this in the readme. so other people can use it as a starting point.