How do I call a model deployed using fastchat?

infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

https://ragflow.io

Apache License 2.0

18.54k stars 1.88k forks source link

How do I call a model deployed using fastchat? #498

Open ciaoyizhen opened 5 months ago

ciaoyizhen commented 5 months ago

Describe your problem

Reading the related issue, it says to use ollama to start a local model, but https://ollama.com/library doesn't support ChatGLM,or needs a lot of work to support ChatGLM with ollama, also, currently already using fastchat to deploy other apps, so would like to be able to reuse this model, please Can I start a big model using fastchat and wrap the interface myself using fastapi, disguised as ollama? What are the key interfaces I need to provide to ragflow?

Logistic98 commented 3 months ago

相同需求，ollama就是个玩具儿，太难用了，统一按照OpenAI格式接入不就行了，已经成为业内规范了。ollama官方给的模型都是些4bit量化的，想加个自定义模型还要自己转换格式，也没有vllm推理优化。