Introspection with Multiple Schemas in Mysql

dannyb commented 1 year ago

Hi, I am really liking this library - thank you to all the supporters.

I am adding Kysely to an existing project. The database is MySQL and the tables have been organized into several different schemas/databases in MySQL. I am using kysely-codegen to generate the db.d.ts file.

I have not been able to get kysely-codegen to generate the types for all the different schemas. It generates only the types for the database specified in the DATABASE_URL. If I remove the database name from the DATABASE_URL, it generates no types.

I traced this back to the MysqlIntrospector class inside kysely.

export class MysqlIntrospector implements DatabaseIntrospector {
  readonly #db: Kysely<any>
...
  async getTables(
    options: DatabaseMetadataOptions = { withInternalKyselyTables: false }
  ): Promise<TableMetadata[]> {
    let query = this.#db
      .selectFrom('information_schema.columns as columns')
      .innerJoin('information_schema.tables as tables', (b) =>
        b
          .onRef('columns.TABLE_CATALOG', '=', 'tables.TABLE_CATALOG')
          .onRef('columns.TABLE_SCHEMA', '=', 'tables.TABLE_SCHEMA')
          .onRef('columns.TABLE_NAME', '=', 'tables.TABLE_NAME')
      )
      .select([
        'columns.COLUMN_NAME',
        'columns.COLUMN_DEFAULT',
        'columns.TABLE_NAME',
        'columns.TABLE_SCHEMA',
        'tables.TABLE_TYPE',
        'columns.IS_NULLABLE',
        'columns.DATA_TYPE',
        'columns.EXTRA',
      ])
--->  .where('columns.TABLE_SCHEMA', '=', sql`database()`)
      .orderBy('columns.TABLE_NAME')
      .orderBy('columns.ORDINAL_POSITION')
      .$castTo<RawColumnMetadata>()

On line 54 of mysql-introspector.ts, the where statement, limits the tables to the current database .where('columns.TABLE_SCHEMA', '=', sql'database()') .

For testing purposes, I removed the where clause in my local node_modules and re-ran kysely-codegen, it generated the types for all the schemas! I used the --exclude-pattern command line arg to exclude the MySQL built in schemas.

For info, I am using this to export this command to export the types:

kysely-codegen --dialect mysql --schema --exclude-pattern="{mysql,information_schema,performance_schema,sys,test,temp,tmp}.*" --camel-case --out-file "./db.d.ts"

What would be the best way to solve this? I want to get input on what the best solution would be. I am happy to submit a PR for this. I can see several options, but I am not sure which way to go.

Remove the where clause, getTables() returns all tables in the server. This could introduce bugs to existing code.
Add an option to DatabaseMetadataOptions to specify which schema(s) to get the tables for. The option could be a string or an array of strings. and the SQL would be columns.TABLE_SCHEMA in(?) or something similar.
Add an option to DatabaseMetadataOptions to specify to return all schema tables. If this option was true, it would not limit the tables to a specific schema

I think one or both of the last two options would work. Thanks

igalklebanov commented 1 year ago

Hey 👋

Thank you for the detailed issue. 🙏 And for offering to submit a PR! let's wait for greenlight on this one.

Given the following:

CREATE SCHEMA is a synonym for CREATE DATABASE. https://dev.mysql.com/doc/refman/8.0/en/create-database.html

I think we shouldn't treat MySQL differently than PostgreSQL when it comes to schemas. Since kysely-codegen is the only codegen option for MySQL outside of Prisma, and we can't expect it to align quickly, need to make sure we don't break interop.

In an ideal world, we should do:

remove = database() from where clause.
add tables.TABLE_TYPE in ('BASE_TABLE', 'VIEW') to where clause.

TABLE_TYPE BASE TABLE for a table, VIEW for a view, or SYSTEM VIEW for an INFORMATION_SCHEMA table. The TABLES table does not list TEMPORARY tables. https://dev.mysql.com/doc/refman/8.0/en/information-schema-tables-table.html

I don't think we need any exclusion options given the described steps - it's in parity with PostgreSQL's introspector.

What kind of bugs would this introduce to existing codebases? We could offer an immediately deprecated "current database only" type flag that adds = database() to the where clause to give consumers some time to migrate if required.

@koskimas WDYT?

koskimas commented 1 year ago

We originally didn't have the = database() filter but we added it based on user request(s). I don't think people treat MySQL schemas the same way as Postgres schemas. Especially locally, people might have all their projects' databases in a single MySQL server. On Postgres schemas live under a single database and are always a part of that database.

dannyb commented 1 year ago

Hi @igalklebanov and @koskimas,

Thank you for your insights. I concur that in many MySQL projects, all database objects are typically housed in one schema or database. However, in the context of this project, there's a unique organization across about seven databases or schemas.

Removing the = database() filter could introduce regressions, as all tables/views across all databases—including MySQL's built-in ones—might be returned. We need to strike a balance between flexibility and backward compatibility.

While kysely-codegen offers the flexibility of including or excluding patterns, not every use case may involve its introspection:

bash kysely-codegen --dialect mysql --schema --exclude-pattern="{mysql,information_schema,performance_schema,sys,test,temp,tmp}.*" --camel-case --out-file "./db.d.ts"

A few considerations and solutions:

Intention of getTables():
- Is the goal to retrieve all tables in a database or across the server? Depending on this, we might need to reevaluate the use of = database(). What does this do on Postgres for example? Should we make it consistent?
Exclude Built-in Databases:
- We could modify the = database() filter to not in('mysql','information_schema','performance_schema','sys','test','temp','tmp') to return user-created tables/views server-wide.
Flexible DatabaseMetadataOptions:
- To prevent regression, we could retain = database() as the default. However, we can enhance the flexibility of DatabaseMetadataOptions by introducing an option or options. I can see three needs:
  - default: keeps existing = database() filter
  - Option to have no filter.
  - Option to pass an array: it filters by the given schemas using in ('array0', 'array1', 'arrayn').
Filter by Table Type:
- I agree that adding the tables.TABLE_TYPE in ('BASE_TABLE', 'VIEW') filter would be beneficial.

Thanks again for discussing this.

kysely-org / kysely

Introspection with Multiple Schemas in Mysql #620