gusye1234 / nano-graphrag

A simple, easy-to-hack GraphRAG implementation
MIT License
1.7k stars 164 forks source link

Embedding维度是固定的吗? #13

Closed czy1999 closed 3 months ago

czy1999 commented 3 months ago

依据example中的示例,同时切换llm和本地embedding后,出现维度不匹配报错

def insert():
    from time import time

    with open("./book.txt", encoding="utf-8-sig") as f:
        FAKE_TEXT = f.read()

    remove_if_exist(f"{WORKING_DIR}/milvus_lite.db")
    remove_if_exist(f"{WORKING_DIR}/kv_store_full_docs.json")
    remove_if_exist(f"{WORKING_DIR}/kv_store_text_chunks.json")
    remove_if_exist(f"{WORKING_DIR}/kv_store_community_reports.json")
    remove_if_exist(f"{WORKING_DIR}/graph_chunk_entity_relation.graphml")
    half_len = len(FAKE_TEXT) // 2
    rag = GraphRAG(
        working_dir=WORKING_DIR,
        enable_llm_cache=True,
        best_model_func=deepseepk_model_if_cache,
        cheap_model_func=deepseepk_model_if_cache,
        embedding_func=local_embedding,
    )
    start = time()
    rag.insert(FAKE_TEXT)
    print("indexing time:", time() - start)
  File "/home/xxxx/anaconda3/lib/python3.11/site-packages/nano_vectordb/dbs.py", line 71, in __post_init__
    storage["embedding_dim"] == self.embedding_dim
AssertionError: Embedding dim mismatch, expected: 384, but loaded: 1536

是node2vec_params的原因吗,测试了一下好像好事会报错 https://github.com/gusye1234/nano-graphrag/blob/3a8c406211fc415e4c45f014cce5f8d34d40062a/nano_graphrag/graphrag.py#L56-L66

gusye1234 commented 3 months ago

你这个working dir是不是之前跑了一次,用的openai的, 然后又用local embedding跑了一次? 这个错的意思是说nano-graphrag加载了之前的向量数据库 然后发现之前的向量维度和现在的对不上。

czy1999 commented 3 months ago

是的,删除原先dir就可以了 ,感谢