supervisor方式启动，无cache_status这个字段，导致前端cache筛选无法显示出已经缓存的模型

xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

Apache License 2.0

5.1k stars 411 forks source link

System Info / 系統信息

win10,python3.10,xinferencev0.15.0,其余环境均为xinference使用该命令pip install "xinference[transformers]" 安装的版本

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

[ ] docker / docker
[X] pip install / 通过 pip install 安装
[ ] installation from source / 从源码安装

Version info / 版本信息

xinferencev0.15.0

The command used to start Xinference / 用以启动 xinference 的命令

xinference-supervisor --host 127.0.0.1 xinference-worker --host 127.0.0.1

Reproduction / 复现过程

微信图片_20241011154744 微信图片_20241011154751 这是local启动的，supervisor启动的时候也是调这个接口，但是返回的结果没有cache_status这个字段

微信图片_20241011154756

Expected behavior / 期待表现

supervisor启动时能正常显示cache的模型

在core/supervisor.py文件中有个todo,方便问下集群部署的时候是会出现什么问题导致无法这么做呢？

    async def _to_llm_reg(
        self, llm_family: "LLMFamilyV1", is_builtin: bool
    ) -> Dict[str, Any]:
        from ..model.llm import get_cache_status

        instance_cnt = await self.get_instance_count(llm_family.model_name)
        version_cnt = await self.get_model_version_count(llm_family.model_name)

        if self.is_local_deployment():
            specs = []
            # TODO: does not work when the supervisor and worker are running on separate nodes.
            for spec in llm_family.model_specs:
                cache_status = get_cache_status(llm_family, spec)
                specs.append({**spec.dict(), "cache_status": cache_status})
            res = {**llm_family.dict(), "is_builtin": is_builtin, "model_specs": specs}
        else:
            print(llm_family)
            res = {**llm_family.dict(), "is_builtin": is_builtin}
        res["model_version_count"] = version_cnt
        res["model_instance_count"] = instance_cnt
        return res

xorbitsai / inference