kysely-org / kysely

A type-safe typescript SQL query builder
https://kysely.dev
MIT License
10.36k stars 263 forks source link

Case of extremely slow type checking #867

Open agj opened 7 months ago

agj commented 7 months ago

I have a query pattern that degrades tsc type checking performance to an outsized degree. It involves a nested query in selectFrom, and a simple where clause.

This is my most minimal reproduction:

const demoQuery = (db: Kysely<DB>) =>
  db
    .selectFrom((eb) =>
      eb
        .selectFrom("my_table")
        // Removing this `where` makes a dramatic difference in compilation time.
        .where((eb2) => eb2("my_table.id", "in", [1, 2, 3]))
        .selectAll()
        .as("test")
    )
    .selectAll();

With the where line indicated by the comment, running tsc --diagnostics --noEmit takes about 30 seconds, whereas after removing it it takes about a second.

TypeScript diagnostics with where:

Files:              435
Lines:           119235
Identifiers:     112580
Symbols:         258239
Types:           185820
Instantiations: 4743126
Memory used:    581215K
I/O read:         0.19s
I/O write:        0.00s
Parse time:       0.87s
Bind time:        0.27s
Check time:      30.30s
Emit time:        0.00s
Total time:      31.44s

TypeScript diagnostics WITHOUT where:

Files:              435
Lines:           119235
Identifiers:     112577
Symbols:          76912
Types:             2539
Instantiations:   10897
Memory used:    133318K
I/O read:         0.04s
I/O write:        0.00s
Parse time:       0.78s
Bind time:        0.23s
Check time:       0.25s
Emit time:        0.00s
Total time:       1.26s

My DB has a large number of tables, so I'm sure that's part of the issue, but I never had this problem until I added this particular pathological query.

I'm using:


Edit: Here's a Github gist with all of the files needed to reproduce this issue.

agj commented 7 months ago

Forgot to add that I'm immensely appreciative of the work done on Kysely. It's a great tool that lets us work with our database without fear. πŸ™

igalklebanov commented 7 months ago

Hey πŸ‘‹

Does this improve performance?

const demoQuery = (db: Kysely<DB>) =>
  db
    .selectFrom((eb) =>
      eb
        .selectFrom("my_table")
        .where("my_table.id", "in", [1, 2, 3])
        .selectAll()
        .as("test")
    )
    .selectAll();
agj commented 7 months ago

Hi! I'm away from my computer, as I'm on vacation right now, but as soon as I'm back I'll give it a try!

agj commented 7 months ago

@igalklebanov It did improve it quite a bit! But it's still not great…

Files:              435
Lines:           119236
Identifiers:     112578
Symbols:         246727
Types:             7593
Instantiations: 1355207
Memory used:    212941K
I/O read:         0.05s
I/O write:        0.00s
Parse time:       0.69s
Bind time:        0.30s
Check time:      11.30s
Emit time:        0.00s
Total time:      12.29s
kevin-johne commented 6 months ago

Hello there, and thank you for the amazing project, I introduced the typesafe query builder into our project which containes alot of tables. I wonder if Kysely is made for a scale of 100s and 100s of tables with 100s of complex queries.

Without Kyseley type checking is around 8 seconds, after starting generating the types of our db and only include ~100tables and written complex queries which use the expression builder to compose sub queries with joints. The query results are efficient and build for purpose And I start already plunching in the 60 seconds type checking. This makes the intellisense so slow and with delay that I'm not able to write more queries.

If this performance is odd I will go deeper into stats and post them here. Thank you

TypeScript version 5.1.6

igalklebanov commented 6 months ago

I'd strongly consider splitting data access to several Kysely instances with their own limited view of the world, if possible. I believe kysely-codegen has a filtering mechanism, should come in handy.

Kysely is built around, for the most part, string literal type comparisons. The more unique column and table names in DB interface, the slower type-checking gets and harder language server works.

kevin-johne commented 6 months ago

thank you for your quick reply, I will give this a try.

thelinuxlich commented 6 months ago

would a split into multiple schemas on the same interface, differentiated by the prefix, alleviate the performance issue?

koskimas commented 5 months ago

I'm not able to reproduce this. Any chance to get a full reproduction? A complete project I can run and profile?

agj commented 5 months ago

Yeah, sure. Here's a Github gist with the relevant files, including an anonymized version of the database types.

imwexpex commented 4 months ago

A big issue on our project as well. Please let me know if there is anything I could assist with.

koskimas commented 2 months ago

The issue here is basically due to the metric shit-ton of tables and columns and some O(N^X) type checking corner-case. The typescript type checker is so insanely complex that debugging this is difficult.

There probably is some way to make this case faster, but the only way to find it is blind trial and error. Or alternatively getting the help of one of the ~10 people on earth that actually understand how typescript internals work.

koskimas commented 2 months ago

A big issue on our project as well. Please let me know if there is anything I could assist with.

You can try profiling the typescript type checker and figuring out which type causes the issue.

koskimas commented 2 months ago

The issue is somehow related to selectFrom overloads. If I replace all overloads with this single signature:

  selectFrom<
    TE extends
      | TableExpression<DB, never>
      | ReadonlyArray<TableExpression<DB, never>>,
  >(
    from: TE,
  ): TE extends ReadonlyArray<infer T>
    ? SelectQueryBuilder<From<DB, T>, FromTables<DB, never, T>, {}>
    : TE extends keyof DB & string
      ? SelectQueryBuilder<DB, ExtractTableAlias<DB, TE>, {}>
      : SelectQueryBuilder<From<DB, TE>, FromTables<DB, never, TE>, {}>

the type checking time goes from 12 seconds to 0.7 seconds. That type also passes all Kysely tests except this one:

async function testGenericSelect<T extends keyof Database>(
  db: Kysely<Database>,
  table: T,
) {
  const r1 = await db.selectFrom(table).select('id').executeTakeFirstOrThrow()
  expectAssignable<string | number>(r1.id)
}

which tests generic usage of selectFrom which we don't really support anyway.

koskimas commented 2 months ago

But we have the same issue with updateTable and all other QueryCreator methods that have been split into multiple overloads 😞

koskimas commented 2 months ago

Fixed updateTable and deleteFrom the same way.

Now I get this when there are no type errors

Screenshot 2024-07-15 at 16 45 55

And this when there are:

Screenshot 2024-07-15 at 16 46 35
koskimas commented 2 months ago

There's now a PR for this. @agj Would you be able to test the PR branch in your codebase?

koskimas commented 2 months ago

@imwexpex Could you also build the PR branch and test if it fixes the peformance issues you're having?

  1. pull the branch
  2. npm install
  3. npm run build
  4. Point your project to the local build
imwexpex commented 2 months ago

@imwexpex Could you also build the PR branch and test if it fixes the peformance issues you're having?

  1. pull the branch
  2. npm install
  3. npm run build
  4. Point your project to the local build

Unfortunately, the fix did not have a big impact on my project

Latest release:

PR Branch:

image
koskimas commented 2 months ago

Unfortunately, the fix did not have a big impact on my project

1.86 seconds is not a long time. Looking at the numbers, you don't actually have any performance issues. Especially given that the parsing time is also 1.5 seconds. Type checking 3000 files in 1.86 seconds is actually insanely fast.

imwexpex commented 2 months ago

Unfortunately, the fix did not have a big impact on my project

1.86 seconds is not a long time. Looking at the numbers, you don't actually have any performance issues. Especially given that the parsing time is also 1.5 seconds.

Agree with you, there is no issue with compilation time, but rather with WebStorm/VSCode Intellisense. For some reason, hard queries kill IDE suggestions/linting.

This was also mentioned by @kevin-johne

This makes the intellisense so slow and with delay that I'm not able to write more queries.

koskimas commented 2 months ago

Well without any additional info, I won't be able to help you. I'm not seeing slow intellisense in vscode in any of our tests, including @agj's reproduction.

Intellisense uses the same type checker as the compiler.

koskimas commented 2 months ago

Also, if the issue was in intellisense and not in the build, why did you send the build benchmark? Did you actually check the intellisense performance with the new branch?

agj commented 2 months ago

Well @koskimas, not sure if this is good or bad news. The issue was for version 0.22.2, and I just confirmed that the issue persists in that version as well as in 0.22.3.

However, both 0.22.4 and the PR's version behave much better! In fact, indistinguishably so, both hover around 7 s for me, compared to the >60 s of the older versions. So, it looks like the changes in the PR don't do much when it comes to this particular issue, but something in v0.22.4 fixed the issue.

I'm very happy that the issue is under control, and I'm sorry that I wasn't able to identify in time that it was fixed in the latest release. πŸ™

koskimas commented 2 months ago

@agj I didn't actually have any perf issues in your reproduction either before the fix when the types were correct, but once I mistyped something, I got the 12 second build time and 100s of thousands of instantations.

If you take 0.22.4 and test this

const demoQuery = (db: Kysely<DB>) =>
  db
    .selectFrom((eb) =>
      eb
        .selectFrom("my_table_LOOOOL")
        // Removing this `where` makes a dramatic difference in compilation time.
        .where((eb2) => eb2("my_table.id", "in", [1, 2, 3]))
        .selectAll()
        .as("test")
    )
    .selectAll();

versus the branch, do you see a difference?

imwexpex commented 2 months ago

Also, if the issue was in intellisense and not in the build, why did you send the build benchmark? Did you actually check the intellisense performance with the new branch?

Sure, I checked. 3+ sec lag for any suggestion and the RAM usage goes up to 3GB on the opened file.

I've tried to reproduce the same query in your test file, and it looks like everything works smoothly. So probably an issue somewhere else...

imwexpex commented 2 months ago

Query example, to be clear:

db
    .selectFrom((eb) =>
      eb
        .selectFrom('my_table')
        .where('my_table.id', 'in', [1, 2, 3])
        .where('my_table.col_164b7896ec8e770207febe0812c5f052', 'is', 12)
        .select((eb) => [
          jsonObjectFrom(
            eb
              .selectFrom('table_1474a7e0348b1ca363ee4a2a5dc4a1ec')
              .selectAll('table_1474a7e0348b1ca363ee4a2a5dc4a1ec')
              .whereRef(
                'my_table.id',
                '=',
                'table_1474a7e0348b1ca363ee4a2a5dc4a1ec.id',
              ),
          ).as('my_table'),
          jsonArrayFrom(
            eb
              .selectFrom('table_0b5ac72e03509e06683edcba4b3887ab')
              .selectAll('table_0b5ac72e03509e06683edcba4b3887ab')
              .where(({ eb: eb1, ref, selectFrom }) =>
                eb1(
                  ref('table_0b5ac72e03509e06683edcba4b3887ab.id'),
                  'in',
                  selectFrom('table_da9bc7793b7e3784c03d55128145b7e3')
                    .where('id', '=', 1)
                    .select('id'),
                ),
              )
              .whereRef(
                'my_table.id',
                '=',
                'table_0b5ac72e03509e06683edcba4b3887ab.id',
              )
              .select((eb1) => [
                jsonObjectFrom(
                  eb1
                    .selectFrom('table_ede804aecaaa0ad1a5afffb9b2a0d927')
                    .whereRef(
                      'table_ede804aecaaa0ad1a5afffb9b2a0d927.id',
                      '=',
                      'table_0b5ac72e03509e06683edcba4b3887ab.col_1b6b0183f1a79f6d53cc5f8bd5f315d7',
                    )
                    .where(
                      'table_ede804aecaaa0ad1a5afffb9b2a0d927.col_80ae6f72f9c170f9e1bcae48c1677802',
                      '!=',
                      12,
                    )
                    .selectAll('table_ede804aecaaa0ad1a5afffb9b2a0d927')
                    .select((eb2) => [
                      jsonObjectFrom(
                        eb2
                          .selectFrom('table_81859a5a7cf340e00333e133ee83c6a3')
                          .select([
                            'col_0191c0aa0d39de591b5236b304496123',
                            'col_08de056b6d3139a95d0aaf99f8e3c52e',
                            'col_195367e97caee2b9ed726514b7a38efc',
                          ])
                          .whereRef(
                            'table_ede804aecaaa0ad1a5afffb9b2a0d927.col_a7f4797b61012da3a9c1a8049c8a4974',
                            '=',
                            'table_81859a5a7cf340e00333e133ee83c6a3.col_aa6b7c7a9c7a177e3f1ba452783eb63b',
                          ),
                      ).as(
                        'table_ede804aecaaa0ad1a5afffb9b2a0d927.col_aa6b7c7a9c7a177e3f1ba452783eb63b',
                      ),
                      jsonObjectFrom(
                        eb2
                          .selectFrom('table_1084015020d8bed5c99852780cd060b5')
                          .select([
                            'table_1084015020d8bed5c99852780cd060b5.col_665cebec3ce1380ce4e3c90b1bedcbf5',
                          ])
                          .whereRef(
                            'table_1084015020d8bed5c99852780cd060b5.id',
                            '=',
                            'table_1084015020d8bed5c99852780cd060b5.col_53ca581ab7a5aef5b659bc27c81d5d51',
                          ),
                      ).as(
                        'table_1084015020d8bed5c99852780cd060b5.col_53ca581ab7a5aef5b659bc27c81d5d51',
                      ),
                    ]),
                ).as('table_ede804aecaaa0ad1a5afffb9b2a0d927.data'),
                jsonObjectFrom(
                  eb1
                    .selectFrom('table_736d84cc80e257dc78371b1eb0fb6b20')
                    .selectAll('table_736d84cc80e257dc78371b1eb0fb6b20')
                    .whereRef(
                      'table_736d84cc80e257dc78371b1eb0fb6b20.id',
                      '=',
                      'table_0b5ac72e03509e06683edcba4b3887ab.id',
                    ),
                ).as('table_736d84cc80e257dc78371b1eb0fb6b20'),
              ])
              .orderBy('col_15af44699327f9a6bb2f12da27fdf0ce', 'desc'),
          ).as('data'),
        ])
        .as('test'),
    )
    .selectAll()
agj commented 2 months ago

@koskimas You're right! I could reproduce the issue when there's a mistyped table name. Before the PR it's ~28 s, with the PR it's ~7 s. πŸŽ‰

koskimas commented 2 months ago

@imwexpex There seems to be something in jsonArrayFrom and jsonObjectFrom types combined with inline nested queries + huge database that kills typescript language server. I'll take a look at those types.

Having said that, you can already avoid the slowness and arguably make your queries much more readable by splitting them to helper functions like this:

  const result = await db
    .selectFrom((eb) =>
      eb
        .selectFrom('my_table')
        .where('my_table.id', 'in', [1, 2, 3])
        .where('my_table.col_164b7896ec8e770207febe0812c5f052', 'is', 12)
        .select((eb) => [
          relation1(eb.ref('my_table.id')).as('my_table'),
          relation2(eb.ref('my_table.id')).as('data'),
        ])
        .as('test'),
    )
    .selectAll()
    .execute()

  function relation1(foreignKey: Expression<number>) {
    return jsonObjectFrom(
      db
        .selectFrom('table_1474a7e0348b1ca363ee4a2a5dc4a1ec')
        .selectAll('table_1474a7e0348b1ca363ee4a2a5dc4a1ec')
        .where('table_1474a7e0348b1ca363ee4a2a5dc4a1ec.id', '=', foreignKey),
    )
  }

  function relation2(foreignKey: Expression<number>) {
    return jsonArrayFrom(
      db
        .selectFrom('table_0b5ac72e03509e06683edcba4b3887ab')
        .selectAll('table_0b5ac72e03509e06683edcba4b3887ab')
        .where(({ eb, ref, selectFrom }) =>
          eb(
            ref('table_0b5ac72e03509e06683edcba4b3887ab.id'),
            'in',
            selectFrom('table_da9bc7793b7e3784c03d55128145b7e3')
              .where('id', '=', 1)
              .select('id'),
          ),
        )
        .where('table_0b5ac72e03509e06683edcba4b3887ab.id', '=', foreignKey)
        .select((eb1) => [
          relation3(
            eb1.ref(
              'table_0b5ac72e03509e06683edcba4b3887ab.col_1b6b0183f1a79f6d53cc5f8bd5f315d7',
            ),
          ).as('data'),

          relation4(eb1.ref('table_0b5ac72e03509e06683edcba4b3887ab.id')).as(
            'data2',
          ),
        ])
        .orderBy('col_15af44699327f9a6bb2f12da27fdf0ce', 'desc'),
    )
  }

  function relation3(foreignKey: Expression<string | null>) {
    return jsonObjectFrom(
      db
        .selectFrom('table_ede804aecaaa0ad1a5afffb9b2a0d927')
        .where('table_ede804aecaaa0ad1a5afffb9b2a0d927.id', '=', foreignKey)
        .where(
          'table_ede804aecaaa0ad1a5afffb9b2a0d927.col_80ae6f72f9c170f9e1bcae48c1677802',
          '!=',
          12,
        )
        .selectAll('table_ede804aecaaa0ad1a5afffb9b2a0d927')
        .select((eb) => [
          jsonObjectFrom(
            eb
              .selectFrom('table_81859a5a7cf340e00333e133ee83c6a3')
              .select([
                'col_2f76db193eac6ad0f152563313673ac9',
                'col_4d742b2f247bec99b41a60acbebc149a',
                'col_6137cde4893c59f76f005a8123d8e8e6',
              ])
              .whereRef(
                'table_ede804aecaaa0ad1a5afffb9b2a0d927.col_a7f4797b61012da3a9c1a8049c8a4974',
                '=',
                'table_81859a5a7cf340e00333e133ee83c6a3.col_aa6b7c7a9c7a177e3f1ba452783eb63b',
              ),
          ).as(
            'table_ede804aecaaa0ad1a5afffb9b2a0d927.col_aa6b7c7a9c7a177e3f1ba452783eb63b',
          ),

          jsonObjectFrom(
            eb
              .selectFrom('table_1084015020d8bed5c99852780cd060b5')
              .select([
                'table_1084015020d8bed5c99852780cd060b5.col_665cebec3ce1380ce4e3c90b1bedcbf5',
              ])
              .whereRef(
                'table_1084015020d8bed5c99852780cd060b5.id',
                '=',
                'table_1084015020d8bed5c99852780cd060b5.col_53ca581ab7a5aef5b659bc27c81d5d51',
              ),
          ).as(
            'table_1084015020d8bed5c99852780cd060b5.col_53ca581ab7a5aef5b659bc27c81d5d51',
          ),
        ]),
    )
  }

  function relation4(foreginKey: Expression<number>) {
    return jsonObjectFrom(
      db
        .selectFrom('table_736d84cc80e257dc78371b1eb0fb6b20')
        .selectAll('table_736d84cc80e257dc78371b1eb0fb6b20')
        .where('table_736d84cc80e257dc78371b1eb0fb6b20.id', '=', foreginKey),
    )
  }