Open yebanliuying opened 3 months ago
MemoryWithRag默认的embedding模型也会调用dashscope api,导致了这个问题。下载使用本地embedding模型的方式理论上直接可用,但还未测试,我们测试后提供。这里直接用dashscope api是因为本地模型支持的并发量较低,此前出现过响应过慢超时的问题。
另外两个MemoryWithXxx类会使用下载开源embedding模型。
MemoryWithRag默认的embedding模型也会调用dashscope api,导致了这个问题。下载使用本地embedding模型的方式理论上直接可用,但还未测试,我们测试后提供。这里直接用dashscope api是因为本地模型支持的并发量较低,此前出现过响应过慢超时的问题。
另外两个MemoryWithXxx类会使用下载开源embedding模型。
我现在用本地模型去调用MemoryWithRetrievalKnowledge 报错,就是上面说的原因吗?我看modelscope_hub.py 包里面model_id: str = "damo/nlp_corom_sentence-embedding_english-base" 。报错信息:ImportError: Could not import some python packages.Please install it with pip install modelscope
.
MemoryWithRag默认的embedding模型也会调用dashscope api,导致了这个问题。下载使用本地embedding模型的方式理论上直接可用,但还未测试,我们测试后提供。这里直接用dashscope api是因为本地模型支持的并发量较低,此前出现过响应过慢超时的问题。
另外两个MemoryWithXxx类会使用下载开源embedding模型。
什么时候能支持本地embedding模型的接入?老实说,能理解你们希望尽量往modelscope靠拢的初衷,但很多时候,受限于环境和要求只能是本地部署,不让接入外部网络。
[CN] 采用 langchain的样例,目前的嵌入模型被下载后是放在ModelScope的cache目录中,虽然不会重复下载,但迁移起来还是有点麻烦,但通常可以在环境变量中设置cache目录和下载目录进行管理,所以下载和管理本地嵌入模型的问题不大,特别是llamaindex的例子,在Settgins里面可以用 local进行修饰,明确调用本地模型。 现在遇到的问题是,对于 MemoryWithRetrievalKnowledge 系列的例子,这个嵌入模型的前置Agent使用的LLM和embedding嵌入之间有依赖关系。qwen-max + damo/nlp_gte_sentence-embedding_chinese-base, OK; qwen-max + Xorbits/bge-large-zh-v1.5, error, siliconflow的qwen-7b+damo/nlp_gte_sentence-embedding_chinese-base, error。我猜测换ollama 经过简单测试可以发现 MemoryWithRetrievalKnowledge 这个工具内耦合的比较重。 相比 llmaindex_rag的例子对这种组合测试就可以轻松过关。 如果大师们有空帮忙看看应该从那个角度入手去解决问题。
[EN] Using the example of langchain, the current embedding model is placed in the cache directory of ModelScope after being downloaded. Although it will not be downloaded repeatedly, it is still a bit troublesome to migrate. However, it can usually be managed by setting the cache directory and the download directory in the environment variables. Therefore, the problem of downloading and managing local embedding models is not a big issue, especially in the example of llamaindex, where the local model can be explicitly called with the 'local' modifier in Settings.
The problem I'm facing now is that for the MemoryWithRetrievalKnowledge series of examples, there is a dependency between the LLM and embedding model. Some tests shown that qwen-max + damo/nlp_gte_sentence-embedding_chinese-base, OK; qwen-max + Xorbits/bge-large-zh-v1.5, error, siliconflow's qwen-7b + damo/nlp_gte_sentence-embedding_chinese-base, error. I guess that ollama can not work well, it can be found that the MemoryWithRetrievalKnowledge tool has a heavy internal coupling.
Compared to the example of llmaindex_rag, this combination test can pass easily. If the Masters are available, please help to see which angle should be approached to solve the problem.
Initial Checks
What happened + What you expected to happen
提示:AssertionError: DASHSCOPE_API_KEY should be set in environ. 因为是本地环境连不了网,只能连接本地ollama
Versions / Dependencies
最新版本
Reproduction script
llm_config = { 'model': 'qwen2', 'model_server': 'ollama', } function_list = [] memory = MemoryWithRag(urls=['tests/samples/常见QA.pdf'], function_list=function_list,llm=llm_config, use_knowledge_cache=False)
Issue Severity
High: It blocks me from completing my task.