Open ethanresnick opened 1 year ago
Any thoughts on this @igalklebanov and @koskimas — in particular around the potential changes I outline at the end ($narrowRowType
and not collapsing cases when the original row type is a union type)?
The types are shredded, mangled, mutilated and shaken (not stirred) before they are used. There is no way to keep the original structure without a serious rewrite. I don't think it's worth it.
Fair enough. And what about $narrowRowType
? That seems much easier to add, but I may be missing something
@ethanresnick I haven't had a chance to read everything yet. Will do tomorrow hopefully.
I agree with @koskimas here. Making .where
narrow the query context seems quite complex and not worth it.
We should definitely explore adding a .$narrowTables
method.
Thanks Igal! I agree with all of that and would love to see something like .$narrowTables
! I can imagine different APIs for it, but I'm not sure what would be easiest for kysely to implement. Do you have any ideas how that would work/look exactly?
For example, is this basically what you had in mind? (Of course, it would obviously have to be expanded to cover some edge cases.)
class SelectQueryBuilder<DB, TB, O> {
$narrowTables<DB2 extends DB>():
O extends Selection<infer SelDB, infer SelTB, infer SE>
? SelectQueryBuilder<DB2, TB, Selection<DB2, SelTB, SE>>
: SelectQueryBuilder<DB2, TB, O>
}
While we flesh that out, I'd like to propose one small change to $narrowType
.
As I showed with my examples above, $narrowType
can often be made to work for the denormalized table use case. For example, in this query (playground):
const result = await kysely
.selectFrom('actions')
.select(['id', 'callback_url as callbackUrl'])
.where('type', '=', 'CALL_WEBHOOK')
.$narrowType<{ id: string, callbackUrl: string }>()
.execute();
Everything works perfectly because the type passed to $narrowType
is a single object type.
However, with this similar query (playground), the narrowing doesn't work well:
const result2 = await kysely
.selectFrom('actions')
.select(['callback_url as callbackUrl', 'type'])
.$narrowType<
| { type: 'CALL_WEBHOOK', callbackUrl: string }
| { type: 'DELETE_FROM_QUEUE', callbackUrl: null }
>()
.execute()
The type of result2
is
Array<{
id: string,
type: 'CALL_WEBHOOK' | 'DELETE_FROM_QUEUE',
callbackUrl: string | null
}>
rather than
Array<
| { type: 'CALL_WEBHOOK', callbackUrl: string }
| { type: 'DELETE_FROM_QUEUE', callbackUrl: null }
>
The issue is that the type passed into $narrowType
was a union, and $narrowType
destroyed the union's internal structure.
While I agree with @koskimas's point that preserving union type structure throughout the query builder would be near impossible and not worth it, I do think preserving the structure at the place where the user is explicitly trying to cast to a more-precise type is essential and fairly easy.
So, concretely, all I'm proposing is changing the definition of NarrowPartial
from:
export type NarrowPartial<S, T> = DrainOuterGeneric<{
[K in keyof S & string]: /* ... fancy conditional type */
}>;
to:
export type NarrowPartial<S, T> = DrainOuterGeneric<
T extends object
? { [K in keyof S & string]: /* same fancy conditional type as above */ }
: never
>;
That makes NarrowPartial
distribute over unions, by using the T extends object
to turn it into a conditional type, rather than squish the cases together if T
is a union type.
What do you think?
So, concretely, all I'm proposing is changing the definition of NarrowPartial from:
That's not enough. You can try changing the definition and see that it doesn't work. There are still multiple steps that mangle the type before it's output.
So, concretely, all I'm proposing is changing the definition of NarrowPartial from:
That's not enough. You can try changing the definition and see that it doesn't work. There are still multiple steps that mangle the type before it's output.
From the testing I did, it seems to be enough for many queries; it enabled improved types with no mangling in quite a few queries in our app. You're right that it's not a complete solution, and, in particular, I imagine queries with JOINs or other post-processing after the $narrowType call will still get mangled. Nevertheless, it seems like a clear, low-effort improvement.
I'd be happy to put up a PR for that change if you'd accept it.
I'd be happy to put up a PR for that change if you'd accept it.
If it doesn't break any existing use cases, then sure I'll merge it.
@koskimas Just an FYI for when any of us have time to come back to this issue (currently I don't): the optimization applied here to evaluate the Selection
type eagerly broke the FixSingleTableReturnedRowType
I demonstrated above, because it's no longer possible, when any columns in the selection have an alias, to get the original, unaliased name of the column.
I.e., it used to be the case that the table type could be something like:
type ActionRow = {
id: GeneratedAlways<string>,
type: 'CALL_WEBHOOK',
queue_id: null,
callback_url: string,
} | {
id: GeneratedAlways<string>,
type: 'DELETE_FROM_QUEUE',
queue_id: string,
callback_url: null,
}
Then, given a query like:
const query = db
.selectFrom('actions')
.select(['type as aliasedType', 'queue_id'])
.where('type', '=', 'CALL_WEBHOOK');
We could write:
const res = (await query.executeTakeFirstOrThrow()) as
FixSingleTableReturnedRowType<typeof query, { type: 'CALL_WEBHOOK }>
And the result would properly be { aliasedType: 'CALL_WEBHOOK', queue_id: null }
. I.e., queue_id
would've been narrowed from string | null
to just null
, based on the type
.
However, that was only possible by inspecting the query's Selection
type to discover that the type
column was selected (before aliasing); using that to narrow the ActionRow
type; then re-applying the aliases to the final returned row. Now, it's only possible to see that something called aliasedType
will show up in the result, but there's no way to use that to intersect/reduce the original ActionRow
type.
@koskimas At least until we come up with some more comprehensive solution, would it be possible to revert the optimization I mentioned above? (Or to investigate reverting it?) Obviously, that will cause some performance regression — although hopefully not a large one.
With that optimization in place, it's difficult/impossible to work with wide, denormalized, data-warehouse-style tables in a remotely type-safe way (because it breaks the utility type I showed earlier for narrowing such tables).
@koskimas @igalklebanov Would you accept a PR to stop evaluating the Selection type eagerly, per above?
Almost every project I've ever worked on has at least a tiny bit of denormalization, so I can't help but feel like this is one of the biggest missing pieces in Kysely's type safety.
Here's another case I ran into today (and one of many since I opened this issue):
type JobExecutionTable = {
jobId: string;
} & ({
status: "SUCCESS"
result: JobSuccessResult
} | {
status: "ERROR",
result: JobErrorResult;
})
Any chance we can jumpstart this issue somehow?
Hi folks. As always, thanks for making such a wonderful library! Kysely is my favorite tool for interacting with a db, and I've used a lot over the years :) Nevertheless, there is one limitation with kysely's typings that my team has run into repeatedly.
Let me say upfront that my team tries harder than most to avoid type assertions (because they can hide real issues during a schema change, which the compiler will no longer flag), and avoiding type assertions often requires very precise types. However, I understand that kysely tries to balance having precise types with keeping the code maintainable and keeping TS compilation times down. Therefore, I can totally imagine @koskimas saying that the workarounds my team has developed internally would be too complex to add to the kysely core. Still, I think it's worth exploring whether there are some changes to kysely core that would improve the UX for these sorts of cases, with an acceptable amount of complexity. I have a couple ideas at the end of this post, but don't know enough about kysely's internals to say exactly what's viable.
With that preface, the problem my team's dealing with is that we have tables where the row type is best described as a union and/or intersection of object types, rather than a simple object type, because the columns' types are correlated, but kysely is throwing away that extra structure.
Examples of the problem + ideal behavior
There are a few different cases where the desire for more complex row types comes up...
First, consider the following tagged union:
To store these actions in the database, I could use one table for each action
type
, or I could put all the action records in the same table. Using one table is very often the best solution: it allows other records to link to an action row of any type, with a single foreign key, while ensuring that every action record is exactly one of types. It also usually provides the best performance. In the OO world, squishing the hierarchy into one table get called "single table inheritance", and is quite common. As the example above shows, though, it applies just as much when working with plain-data tagged unions in an FP style.In my team's code base, we do have an
Action
type like this and we do store these different kinds of actions in a singleactions
table. Our action row type looks something like this:The nullability in the different cases is enforced in the db through a
CHECK
constraint that branches ontype
.We use the above type rather than something like:
because that type erases the underlying structure (e.g., it no longer enforces that
queue_id
can't be null whentype
is'DELETE_FROM_QUEUE'
).With
ActionRow
defined as a union, the ideal (but possibly unworkable) result would be for a query like:to automatically return rows where
callback_url
andcallback_headers
are known to be not null. This doesn't happen today. Of course, the main limitation is that thewhere()
call doesn't change the result type, which has already been discussed in https://github.com/kysely-org/kysely/issues/335#issuecomment-1435515901 and was part of the motivation for$narrowType
...$narrowType as a workaround
This case is a bit more complicated than the typical
$narrowType
use case, because we're trying to narrow the columns based on the value of a column that is not returned (type
). Still using$narrowType
to get a precise return type can work, but it has to be done carefully to be safe, and the result is often a lot of boilerplate.Specifically, something like:
isn't very safe, because it's implicitly depending on the current schema: if a future schema change allows any of the selected columns to take on a wider range of values, those new value types will be hidden in the result type, leading to runtime errors. (This is especially relevant when a previously-non-nullable column is made nullable, or when a JSON column is allowed to take on a wider range of JSON values.)
Therefore, the type given to
$narrowType
should be defined in terms of theActionRow
type, to keep the cast correct/safe in the face of schema changes, like:That type works, but it's a lot to write out manually on every query to a denormalized table, and the manual duplication of the selection in the
Pick
is annoying. It's also probably too involved for most users/teams to apply consistently and correctly — and it doesn't even cover the case of columns being aliased, which is fairly common (e.g., to convert to camel case) and which would make the type longer and more complicated.So, while I'm very glad
$narrowType
exists, I think we could find ways to make handling these cases much more ergonomic.Denormalized databases
While applying
$narrowType
on every query to a denormalized table isn't ideal, at least there will likely be only a small number of such tables (if any) in most projects that use a traditional RDBMS.Where these limitations get more frustrating, though, is when kysely is used with a database whose architecture demands a lot of denormalization, like Snowflake. In Snowflake, for example, JOINs can't use indexes but duplicating column values across rows has ~0 overhead, so it's normal and desirable for db rows to be highly denormalized and contain multiple entities squished into one row. It's then also much more common for queries to only select a small subset of the columns (which makes a much larger performance difference in Snowflake and other 'columnar' dbs). The result is that, for queries against these types of databases, basically every query to every table ends up needing a
$narrowType
call that gets very long and involved. This was the situation my team was in.Most of the row types in these dbs end up being properly modeled with nested union and intersection types, to account for all the denormalization. Intersection types slip in when an entity that is itself a union (like
Action
) then needs to be embedded into another row. E.g.My team's workaround
We developed some helper types that let us do the
$narrowType
cast more ergonomically. It looks like:In that example,
result
ends up (correctly) typed as{ id: string, callbackUrl: string }[]
.FixSingleTableReturnedRowType
takes the type of the query builder as a parameter, and it extracts the target table and the selection, including aliases, and uses them to build the final result type. Hiding all that complexity makes it viable for my whole team to use this. Meanwhile, the second type parameter encodes the "where" clause, which is easy to construct and verify just by looking at the query (i.e., filling in that second parameter doesn't require the developer to use any knowledge they have about the current schema, which could make the cast silently unsafe/invalid in the event of a schema change).For reference, here's a playground showing the implementation of
FixSingleTableReturnedRowType
and how it's applied. It's more complicated than I wish we had to write, but it works for our needs.Directions in Kysely
What my team's doing today isn't really a comprehensive solution (e.g., it only works on queries without joins and doesn't handle aliased raw expressions). It also, obviously, doesn't actually change anything in kysely's core, to make kysely produce the right types automatically more often; it just simplifies the process of applying
$narrowType
.In terms of what could be done in kysely itself...
I'm happy to accept that a
where()
call will have no effect on a query's return type. Even with that, though, there is one place where kysely loses type information that, if preserved, could be very valuable for these sorts of denormalized row use cases. Specifically, the issue is that the current implementation ofSelection
collapses the different cases if the row type is a union. That is, a query like:would ideally return rows of type:
I.e., I would want kysely to 'remember', from the original
ActionRow
type, that either bothcallback_url
andcallback_headers
will be null, or neither will be. Instead, the returned row type looks like:See playground here.
While that seems like a small issue, and I'm not sure how hard it is to fix, fixing it might be an important building block for these use cases.
A potentially easier, but very impactful, change would be to add a helper method called
$narrowRowType
(or whatever) that would narrow the type of the rows in the query's input relation, rather than narrowing the query result objects. The key difference would be that, if it's possible for columns to have correlated types, it can make sense to narrow the result type based on a column that isn't itself included in the result. Specifically, it would enable something like:$narrowRowType
would narrow the type of theactions
table's rows toActionRow & { type: 'CALL_WEBHOOK' }
, so the return type will be totally correct, withcallbackUrl
marked not null, even though thetype
column isn't selected (which it would have to be to use$narrowType
).Something like the above would also be a huge improvement over my team's solution today, which is harder to understand and which requires splitting up the definition of the query from the
$narrowType().execute()
calls (in order fortypeof query
to be read and passed to$narrowType
).Again, I'm not sure what's easy to implement in kysely and what's hard, so I'm just throwing out ideas and trying to start a discussion, rather than pushing for any particular solution. I do think, though, that tables with denormalized rows aren't super well served today.