intel-analytics / ipex-llm-tutorial

Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using ipex-llm
https://github.com/intel-analytics/bigdl
Apache License 2.0
138 stars 35 forks source link

Python performance is too poor, can it provide an inference library in C++ and provide an OpenAI-compatible API #73

Open geffzhang opened 6 months ago

geffzhang commented 6 months ago

Using bigdl-llm in a production environment, Python performance is too poor, can you provide an inference library in C++ and provide an OpenAI-compatible API

jason-dai commented 6 months ago

Please see https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/vLLM-Serving

geffzhang commented 6 months ago

This is written in Python, can C++ be added as well? Python's performance is not satisfactory.

jason-dai commented 6 months ago

This is written in Python, can C++ be added as well? Python's performance is not satisfactory.

Unfortunately there is no such plan at this moment