Closed MithrilMan closed 1 year ago
@junrushao Could you tell me your thoughts on this? If you guys really want to develop API support for this, I would love to contribute here.
@tqchen happy to have your insights on this as well.
@junrushao Could you tell me your thoughts on this? If you guys really want to develop API support for this, I would love to contribute here.
Would love to hear from community about what kinds of developer API we are looking for.
At the current moment libmlc_llm can indeed be leveraged as a local API, that is how we build the iOS app. We plan to document this part more clearly in the incoming weeks. If you have more suggestions on API we can build(e.g. something related to OpenAI API), we would love to hear about them and support the community
as a builder, in my scenario I'm building a modular self hosted "Host" that uses services to accomplish whatever kind of task I'll implement (conversational chat, auto-gpt like, image generation and so on) and the host is just an orchestrator but the work is done by specialized services (e.g. stable diffusion could use Automatic1111 API, or dall-e, or whatever) For what matters to you, of course this is about text-generation and so the API should expose what's capable of.
Can generate embedding? then an api for embedding Can generate completion? then an api for completion Is suitable for chat completion? then maybe a specialized chat completion API
Since we all know who's the big player actually in LLM, mimicking the OpenAI API could be a safe choice because would mean that's easier for builders to just switch from OpenAI to your model, sounds good?
to be explicit:
I would let out more complex use cases like edits https://platform.openai.com/docs/api-reference/edits/create if the model can't perform it and fine tuning apis, or at least in my case
Came looking for a way to add FastAPI endpoints to the input/output text. I would love to create a webui but think your underlying application should focus on the core and simply open up some endpoints to create a cohesive conversation with a client with a unique token.
EDIT: I am potentially interested in helping build them out. I wrote a completion app using BLOOM. The app splits the UI from the completion work done in the backend and the backend keeps the threads distinct.
Agree with all comments above. The highlight of this awesome effort is the ability to use a range of models cross-platform at high performance - but doing so requires a mix of languages and compilation. If the core compiled elements simply exposed an API, the functionality built on top of it would be language-independent, and enable both human and automated interaction. I could easily see this work becoming the primary foundation for local LLM applications going forward.
Thanks for your effort. Do you plan add an API layer on top of this, to use your layer as a local API layer ?
In my scenario I'd like to host in a docker instance your library and query it by using API to feed a custom application I've started (see https://github.com/MithrilMan/AIdentities ) It would make use of several Models, not just for text generation, but of course here I'm just interested in the Text-Gen stuff
Ideally the API should have endpoints for Completions and Embedding
Is this something do you plan to have?