langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
95.22k stars 15.45k forks source link

PineconeEmbeddings.dimension is ignored #26507

Open davidgilbertson opened 2 months ago

davidgilbertson commented 2 months ago

Checked other resources

Example Code

This will raise an exception:

from langchain_pinecone import PineconeEmbeddings

embeddings = PineconeEmbeddings(
    model="multilingual-e5-large",
    dimension=512,
)

vector = embeddings.embed_query("Why?")
assert len(vector) == 512

Error Message and Stack Trace (if applicable)

No response

Description

I see that you have some tests here, but the tests use the default embedding of 1024, so don't catch the fact that it doesn't work for any other dimension.

I suspect if you change that line in the test to DIMENSION=333 or something the tests will fail.

System Info

langchain==0.2.16
langchain-anthropic==0.1.22
langchain-community==0.2.16
langchain-core==0.3.0
langchain-google-genai==1.0.8
langchain-google-vertexai==1.0.8
langchain-milvus==0.1.4
langchain-openai==0.1.20
langchain-pinecone==0.2.0
langchain-text-splitters==0.2.2

Windows, Python 3.10

efriis commented 2 months ago

@gdj0nes could you look at this? Agreed the dimension parameter doesn't seem to do anything, and looks like this was in the initial Pinecone embedding integration here: #24515

gdj0nes commented 2 months ago

@davidgilbertson thanks for raising the issue. The model multilingual-e5-large does not support variable dimensions. We'll look into how to better do error handling in these cases.