Closed authurlord closed 6 months ago
Hi @authurlord,
Thank you for your suggestion. To be honest I am not too familiar with FastChat, so will have to investigate it further. Regarding vLLM, most likely it will be supported in some form eventually, but so far we did not do any development in this direction.
Just add the possibility to change the openai_base and it will work
@bacoco yes, this is the most straight-forward solution which we will do for sure.
We were thinking about some sort of deeper integration though, but did not make much progress yet.
Resolved with https://github.com/iryna-kondr/scikit-llm/pull/94
Thanks for your great work! Since https://github.com/lm-sys/FastChat can initiate a local server on llama2/vicuna, which api is quite similar to openai, it is possible to support FastChat api server, so we can inference with a local api server?
Besides, is there any plan to support batch inference with https://github.com/vllm-project/vllm? The tabular data examples are similar, so batch inference with vLLM could speed up the whole process than gpt4all