langchain-ai / langchainjs

šŸ¦œšŸ”— Build context-aware reasoning applications šŸ¦œšŸ”—
https://js.langchain.com/docs/
MIT License
12.16k stars 2.05k forks source link

SQLDatabaseChain creates database query without checking database table, fields. #2571

Closed yerycs closed 11 months ago

yerycs commented 11 months ago

Here is my js code

const connection = await createConnection({
    type: "mysql",
    host: "localhost",
    port: 3306,
    username: "root",
    password: "****",
    database: "fintron-invest-dev",
  });
  const db = new SqlDatabase({
    appDataSource: connection,
  });
  const llm = new OpenAI({
    temperature: 0,
    openAIApiKey: '****'
  });

  const dbChain = new SqlDatabaseChain({
    llm: llm,
    database: db,
    verbose: true,
  })

  const result = await dbChain.run("Show me the name of the biggest market capacity security");
  console.log(result);

When I run above code, it generates database query without checking real database table and fields. This is generated query

SELECT `name` FROM `securities` WHERE `market_capacity` = (SELECT MAX(`market_capacity`) FROM `securities`) LIMIT 5;

But there isn't securities table nor market_capacity fields in database.

When I run python langchain, it checks database and creates query with exact table name fields.

What is the issue in js langchain?

dosubot[bot] commented 11 months ago

šŸ¤–

Hello,

Thank you for reaching out and providing detailed information about your issue.

From your description, it seems like the LangChain Javascript/Typescript framework is generating SQL queries without validating the existence of the database tables and fields. This is indeed a feature of the LangChain framework, but it appears to not be functioning as expected in your case.

The LangChain framework uses the verifyListTablesExistInDatabase, verifyIncludeTablesExistInDatabase, and verifyIgnoreTablesExistInDatabase functions in the sql_utils.ts file to validate the existence of database tables before generating a query. However, these functions do not validate individual fields within these tables.

The SqlDatabaseChain class generates SQL queries by using a language model to translate natural language queries into SQL. This is done in the _call method of the class. It retrieves the table information from the database, prepares the inputs for the language model, and then calls the predict method of the LLMChain instance to generate the SQL command.

If the securities table or the market_capacity field do not exist in your database, the SqlDatabaseChain class should throw an error. However, it seems like this is not happening in your case.

This could be due to a bug in the LangChain framework or a misconfiguration in your code. To help us investigate this issue further, could you please provide the following information:

  1. The version of LangChain you are using.
  2. The exact error message you are receiving when you run your code.
  3. Any additional code or configuration that might be relevant to this issue.

This information will help us understand the issue better and provide a more accurate solution.

Thank you for your patience and cooperation.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a šŸ‘ if this is helpful and šŸ‘Ž if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

yerycs commented 11 months ago

@dosu-bot I am getting following SQLDatabaseChain error because there isn't table nor field in database and it query database with incorrect generated query.

QueryFailedError: ER_NO_SUCH_TABLE: Table 'fintron-invest-dev.securities' doesn't exist

What I want to know, why SQLDatabaseChain creates query without checking database tables and fields.

What should I do to create correct query in js?

Same again, python langchain SQLDatabaseChain generated correct db query with checking database tables and fields.

dosubot[bot] commented 11 months ago

šŸ¤–

Hello,

Thank you for bringing this issue to our attention. The SqlDatabaseChain class in the LangChain Javascript/Typescript framework is generating SQL queries without validating the existence of database tables and fields because it directly uses the input provided to it to generate the SQL queries. This is done in the _call method where it creates an instance of LLMChain and uses it to predict the SQL command based on the input provided.

To adjust it to generate correct queries similar to the Python version of LangChain's SQLDatabaseChain, you would need to add a validation step before generating the SQL command. This validation step should check the existence of the tables and fields in the database that are mentioned in the input.

Here is a simple way to add this validation:

// Add this method to the SqlDatabaseChain class
async validateTablesAndFields(tablesToUse: string[], tableInfo: any): Promise<void> {
  for (const table of tablesToUse) {
    if (!tableInfo[table]) {
      throw new Error(`Table ${table} does not exist in the database.`);
    }
    // Add field validation here if necessary
  }
}

// Modify the _call method to use the new validation method
async _call(
  values: ChainValues,
  runManager?: CallbackManagerForChainRun
): Promise<ChainValues> {
  // ...
  const tablesToUse = values.table_names_to_use;
  const tableInfo = await this.database.getTableInfo(tablesToUse);

  // Validate the tables and fields before generating the SQL command
  await this.validateTablesAndFields(tablesToUse, tableInfo);

  // ...
}

This code adds a new method validateTablesAndFields to the SqlDatabaseChain class that checks if the tables mentioned in the input exist in the database. If a table does not exist, it throws an error. You can extend this method to also validate the fields in the tables if necessary.

Then, in the _call method, it calls this new validation method before generating the SQL command. This ensures that the SQL command is only generated if the tables and fields in the input are valid.

Please note that this is a simple solution and might not cover all edge cases. You might need to adjust it to fit your specific needs. Also, please remember to handle the errors thrown by the validateTablesAndFields method in your code to prevent your application from crashing.

I hope this helps! If you have any other questions, feel free to ask.

Best regards, Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a šŸ‘ if this is helpful and šŸ‘Ž if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.