langchain-ai / langchainjs

🦜🔗 Build context-aware reasoning applications 🦜🔗
https://js.langchain.com/docs/
MIT License
12.49k stars 2.12k forks source link

Neo4j similaritySearchWithScore crashes on undefined #4752

Closed faileon closed 6 months ago

faileon commented 7 months ago

Checked other resources

Description

Trying to get similarity results with score like this:

    const result = await vectorStore.similaritySearchWithScore('What is foo?', 6);

Produces the following error in the removeLuceneChars function:

modifiedText = modifiedText.split(char).join(' '); TypeError: Cannot read properties of undefined (reading 'split')

The problem is that is trying to process the query, that is not passed from the base class. To my understanding it is not needed. Calling similaritySearchWithScore just preprocesses the query into vector and passing it to similaritySearchVectorWithScore.

The workaround I implemented currently is creating the vector myself, calling similaritySearchVectorWithScore directly and passing empty query:

const queryVector = await embeddings.embedQuery('What is foo?');
const result = await vectorStore.similaritySearchVectorWithScore(queryVector, 6, '');

https://github.com/langchain-ai/langchainjs/blob/37d68a5b1a2ab1039e1459cd6a35c43ae8e3ddd6/libs/langchain-community/src/vectorstores/neo4j_vector.ts#L618

System Info

pnpm 8.6.2 node 20.10.0 "@langchain/community": "^0.0.37", "@langchain/core": "^0.1.45", "@langchain/openai": "^0.0.19", "langchain": "^0.1.27",

dosubot[bot] commented 7 months ago

Thank you for providing detailed information about the issue you're experiencing with the similaritySearchWithScore function in the LangChain.js library. Based on the error message you provided, it seems like the issue is occurring within the removeLuceneChars function, which is likely trying to call the split method on an undefined value.

The similaritySearchWithScore function is designed to return a Promise that resolves to an array of tuples, each containing a DocumentInterface object and a number representing the similarity score between the query and the document. It does this by embedding the query into a vector, performing a similarity search with the resulting vector, and returning the documents and their corresponding similarity scores.

Given that the error occurs when calling similaritySearchWithScore with the query 'What is foo?', it's possible that the issue lies in how the query is being processed or embedded into a vector. The removeLuceneChars function might be involved in this process, and if it's trying to call split on an undefined value, it could be that the query isn't being properly converted into a format that removeLuceneChars can handle.

Your workaround of creating the vector yourself and calling similaritySearchVectorWithScore directly seems to be a valid solution for now. However, it would be ideal if similaritySearchWithScore could handle this process itself, as it's designed to do.

Unfortunately, without access to the source code of the removeLuceneChars function and the exact stack trace of the error, it's difficult to pinpoint the exact cause of the issue. If you could provide these additional details, it would greatly assist in diagnosing and resolving the problem.

In the meantime, you might want to check if the issue persists when using different queries or when adjusting the k parameter in similaritySearchWithScore. This could help determine if the issue is specific to certain inputs or if it's a more general problem with the function.

I hope this information is helpful, and I look forward to your response. If you have any further questions or concerns, please don't hesitate to ask.

Source

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
jacoblee93 commented 7 months ago

Thanks for the heads up - @tomasonjo if you're around could you have a quick look? Otherwise I can try later.

tomasonjo commented 7 months ago

The lucene query is there for hybrid search. It seems we need to allow null values if it is called directly or just add a single if clause for removing lucene characters if not null. I'm travelling, but maybe @easwee can add a PR