Closed nmiletic closed 1 year ago
Thanks you @nmiletic. Not yet, but I will clean it up and upload later. It is basically an uvicorn server that receives the request, and dispatches inference requests to another uvicorn server that is marshalling the LLM models on the GPU. Hope this helps :)
Thank you for this very interesting project. Is backend code also going open source?