neo4j-labs / llm-graph-builder

Neo4j graph construction from unstructured data using LLMs
https://neo4j.com/labs/genai-ecosystem/llm-graph-builder/
Apache License 2.0
2.29k stars 367 forks source link

Using bert-base-chinese as EMBEDDING_MODEL #608

Closed gy850222 closed 3 months ago

gy850222 commented 3 months ago

I am trying to use bert-base-chinese as the EMBEDDING_MODEL, but I keep getting the following error: ValueError: The provided embedding function and vector index dimensions do not match. Embedding function dimension: 768 Vector index dimension: 384

I have made the following changes:

Updated the .env configuration: _#EMBEDDING_MODEL = "all-MiniLM-L6-v2" EMBEDDINGMODEL = "bert-base-chinese"

Modified common_fn.py: embeddings = SentenceTransformerEmbeddings(

model_name="all-MiniLM-L6-v2"#, cache_folder="/embedding_model"

model_name="bert-base-chinese"#, cache_folder="/embedding_model"

) #dimension = 384 dimension = 768 logging.info(f"Embedding: Using SentenceTransformer , Dimension:{dimension}")

Even updated make_relationships.py: CREATE VECTOR INDEX vector if not exists for (c:Chunk) on (c.embedding) OPTIONS {indexConfig: { vector.dimensions:768, vector.similarity_function: 'cosine' However, none of these changes have resolved the issue. I am experiencing the same error with the DEV version as well. This problem has been troubling me for three days. I have tried many solutions, including reading and modifying the source code, but without success. Any tips or suggestions would be greatly appreciated. @jexp

gy850222 commented 3 months ago

As this issue has been troubling me for a long time, I look forward to your prompt guidance and response. :) @jexp

gy850222 commented 3 months ago

I know where the problem lies, and I'm sorry I didn't figure out that BERT base chinese and all-MILM-L6-v2 are two different models that are incompatible in this scenario. I have successfully used other models, such as paraphrase-multilingual-MiniLM-L12-v2