hyperonym / basaran

Basaran is an open-source alternative to the OpenAI text completion API. It provides a compatible streaming API for your Hugging Face Transformers-based text generation models.
MIT License
1.29k stars 80 forks source link

Support for llama.cpp/ggml models #107

Closed codito closed 1 year ago

codito commented 1 year ago

Is there a plan to add support for llama.cpp models? It could support inference on CPUs.

https://github.com/ggerganov/llama.cpp and https://github.com/thomasantony/llamacpp-python.

Related: https://github.com/hyperonym/basaran/issues/57

peakji commented 1 year ago

Currently there are no plans to add support for llama.cpp and related models. The Basaran project will continue to develop around the Hugging Face ecosystem to be compatible with more existing and future models.

We will also try to introduce more universal technologies to optimize CPU inference.

codito commented 1 year ago

Thanks @peakji, it makes sense. Closing this issue as out of scope.