Intergation issue between langchain-pinecone and google vertex AI textembedding-gecko@003

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain rather than my code.
[X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

if name == 'main': input = 'where is my dog?'

#create embedding function by using model of 'textembedding-gecko@003'
vertexai_embedding_003 = VertexAIEmbeddings(model_name='textembedding-gecko@003')

# init a pinecone vectorstore with vertex ai embedding
pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"), environment='us-central1-gcp')
vector_store = PineconeVectorStore(index_name='embedding-test', embedding=vertexai_embedding_003)

# create a test document
doc = Document(
    page_content=input,
    metadata={'category': 'pet'}
)
# save in the index
vector_store.add_documents([doc])

# similarity search from data we inserted before
print(vector_store.similarity_search_with_score(input))

Error Message and Stack Trace (if applicable)

Screenshot of different vectors by embedding the same input('where is my dog?') Embedding result when doing insertion

Embedding result when doing query

No response

Description

Hello Langchain team, I found the embedding issue between adding embedding in pinecone and do similarity_search_with_score from pinecone by using the model of 'textembedding-gecko@003' of google vertex ai. It only happen on 'textembedding-gecko@003', for 'textembedding-gecko@001' works fine How to reproduce 1, adding input string by using vector_store.add_documents([doc]), before it does insertion, the code will calculate the vectors by 'textembedding-gecko@003'. And then it will store the vectors and metadata into vectorstore. 2, And if we search the exactly same string by using function of 'similarity_search_with_score', our expectation score should be 1, because the input query is the same. But actually, it return '0.79' due to the wrong embedding result

After I debug the code and I found there is issue of embedding ways between stage of adding document and stage of searching document. here is the sreenshot issue We can see adding documents and query documents passed the different 'embedding_task_type' which is the reason of giving the different embedding result by passing the same input

And meanwhile parameter of 'embedding_task_type' is hardcode for these to functions, user is not able to customized it.

Here is the doc of explanation of google https://cloud.google.com/python/docs/reference/aiplatform/latest/vertexai.language_models.TextEmbeddingInput.

Conclusion, if devs follow the documents of langchian to inert and query by using 'textembedding-gecko@003', it is very easy to meet the this issue

System Info

langchain==0.1.14 langchain_google_vertexai==0.1.2 langchain-pinecone==0.0.3

langchain-ai / langchain