langchain-ai / langchainjs

πŸ¦œπŸ”— Build context-aware reasoning applications πŸ¦œπŸ”—
https://js.langchain.com/docs/
MIT License
12.39k stars 2.09k forks source link

IN operator fails in Prisma vector store #6082

Closed shan-mx closed 2 months ago

shan-mx commented 2 months ago

Checked other resources

Example Code

Most of the code are copied from the documentation. The embedding model is changed to jina and the id of Document is changed to Int with autoincrement. The postgres instance is created under the instruction in documentation.

import type { Document } from "@prisma/client";

import { JinaEmbeddings } from "@langchain/community/embeddings/jina";
import { PrismaVectorStore } from "@langchain/community/vectorstores/prisma";
import { Prisma, PrismaClient } from "@prisma/client";

import { env } from "./lib/env.js";

const embeddings = new JinaEmbeddings({
  apiKey: env.JINA_API_KEY,
  model: "jina-embeddings-v2-base-en",
});

const db = new PrismaClient();

const vectorStore = PrismaVectorStore.withModel<Document>(db).create(
  embeddings,
  {
    prisma: Prisma,
    tableName: "Document",
    vectorColumnName: "vector",
    columns: {
      id: PrismaVectorStore.IdColumn,
      content: PrismaVectorStore.ContentColumn,
    },
  },
);

const texts = ["Hello world", "Bye bye", "What's this?"];

const docs = await db.$transaction(
  texts.map((content) => db.document.create({ data: { content } })),
);

await vectorStore.addModels(docs);

const resultOne = await vectorStore.similaritySearch("Hello world", 1, {
  id: {
    in: docs.map((doc) => doc.id),
  },
});

console.log(resultOne);

Prisma Schema:

generator client {
  provider = "prisma-client-js"
}

datasource db {
  provider = "postgresql"
  url      = env("DATABASE_URL")
}

model Document {
  id      Int                    @id @default(autoincrement())
  content String
  vector  Unsupported("vector")?
}

Error Message and Stack Trace (if applicable)

Error: Invalid filter: IN operator requires an array of strings. Received: [ 7, 8, 9 ]

file:///Users/a/DevProjects/prisma-vector-test/node_modules/.pnpm/@langchain+community@0.2.19_ignore@5.3.1_openai@4.52.7/node_modules/@langchain/community/dist/vectorstores/prisma.js:246 throw new Error(Invalid filter: IN operator requires an array of strings. Received: ${JSON.stringify(value, null, 2)});

Description

I'm trying to use IN operator to filter the result in prisma vector store query. The argument are correctly inferred to number[], but I received a runtime error saying that IN operator requires an array of strings. I cannot change them to string[] as it will led to a type error.

System Info

@langchain/community@0.2.19 | MIT | deps: 11 | versions: 89 Third-party integrations for LangChain.js https://github.com/langchain-ai/langchainjs/tree/main/libs/langchain-community/

platform mac

Node v20.11.1

shan-mx commented 2 months ago

The issue comes from this line: https://github.com/langchain-ai/langchainjs/blob/dfd9a2af3672e208039602bdb34e7822f1bed1c2/libs/langchain-community/src/vectorstores/prisma.ts#L432 It simply rejects all non-string arrays when building sql query, but the argument of IN can have various types in real cases. After the removal of this line, the problem is solved. I will open a PR to fix this.

dosubot[bot] commented 2 months ago

Hey there, @shan-mx! πŸ‘‹ I'm here to help you with any bugs or questions you have while waiting for a human maintainer. Let's squash some bugs together!

The issue you're encountering with the IN operator in the Prisma vector store query is due to the operator requiring an array of strings, but the argument is inferred as number[]. This is because the code currently rejects all non-string arrays when building the SQL query.

To resolve this, you need to convert the number[] to string[] before using it in the IN operator. Here's an example of how you can do this:

const numberArray = docs.map((doc) => doc.id); // Example number array
const stringArray = numberArray.map(String); // Convert number[] to string[]

// Now use stringArray with the IN operator
const resultOne = await vectorStore.similaritySearch("Hello world", 1, {
  id: {
    in: stringArray,
  },
});

console.log(resultOne);

By converting the number[] to string[], you ensure that the IN operator receives the correct type of values, thus resolving the runtime error.

Additionally, it seems that the issue stems from a specific line in the prisma.ts file that rejects non-string arrays. You mentioned that removing this line solves the problem, and you plan to open a PR to fix it. This approach should help address the root cause of the issue.

If you need further assistance, @jacoblee93 might be able to provide additional guidance.

To continue talking to Dosu, mention @dosu.