Expression Builder API feedback

ethanresnick commented 1 year ago

Thanks again for all the great work you're doing here! I just started using the new(ish) ExpressionBuilder API, and wanted to give a bit of feedback, which I hope will be helpful.

In general, I think the new design makes a lot of sense, and the new names apply better in the various contexts where expressions can be used than the old ones did. My only suggestion has to do with the abbreviated method names cmpr and bxp. Two things struck me about these:

Using abbreviations felt a bit out of place with the rest of the kysely API, which generally seems favors explicitness and readability over maximum concision (e.g., selectFrom rather than from, or something even shorter). To me, the three letters saved by using cmpr as opposed to compare doesn't seem worth it, given that cmpr is not an especially familiar abbreviation.
However, if the expression builder API is going to use abbreviations — and I can see a case for that in this API — then I think it might be worth adding abbreviated methods for specific, common binary comparisons, like eq, neq, gt, gte, etc. Having methods like that is, to me, both more intuitive (because eq is a much more common/familiar abbreviation than cmpr) and more concise (since the operator can be omitted).

So, I guess the above suggest three possible APIs:

qb.where(({ compare, and }) => and([
  compare('first_name', '=', 'Jennifer'),
  compare('last_name', '!=', 'Tremblay'),
  compare('updated_at', '>', lastMonth)
])

qb.where(({ cmpr, and }) => and([
  cmpr('first_name', '=', 'Jennifer'),
  cmpr('last_name', '!=', 'Tremblay'),
  cmpr('updated_at', '>', lastMonth)
])

qb.where(({ eq, neq, gt, and }) => and([
  eq('first_name', 'Jennifer'),
  neq('last_name', 'Tremblay'),
  gt('updated_at', lastMonth)
])

Personally, I think I prefer (1), for maximum clarity without too much verbosity. But, if maximal concision is the goal, then (3) makes more sense to me than two.

igalklebanov commented 1 year ago

Hey 👋

Thank you for taking the time to write this.

We went with such abbreviations because:

we didn't want these names to collide with SQL keywords, known and unknown. e.g. compare sounds like something some dialect might have, either full-match, or as part of a built-in function name.
we wanted the function names to "get out of the way" when reading/writing the code. The most important part of a cmpr(...) is its arguments, which should read like SQL to align with WYSIWYG. We don't need a fully qualified compare to realize there's a comparison going on, the arguments speak for themselves.

We thought about eq, neq, etc. and decided to not introduce these:

They're not aligned with WYSIWYG.
More code to maintain for ",'=',".length less characters you type/"copilot"?
They're redundant. Too many options to choose from that do the same thing.

ethanresnick commented 1 year ago

Thanks @igalklebanov! I think I agree with not adding eq, neq, etc... I'm curious if you all considered something like this?

 qb.where(({ $, and }) => and([
   $('first_name', '=', 'Jennifer'),
   $('last_name', '!=', 'Tremblay'),
   $('updated_at', '>', lastMonth)
 ]))

db.updateTable('person')
  .set(({ $ }) => ({  
     age: $('age', '+', 1)
  }))

I'm not necessarily arguing for that, but it would seem to be the logical extension of the current design.

I.e., some of the downsides with cmpr/bxp are that they are not names that a user would know to search for (e.g., in the API docs) or that they'd necessarily recognize as being the function that they're looking for (e.g., in an autocomplete dropdown). $ has both of those downsides, but then seems to do better at "getting out of the way" and letting the arguments speak for themselves.

In the above example, $ would probably be a shorthand for bxp, as bxp seems like the more general one, which people could use almost all the time (unless they run into TS performance/type depth issues).

koskimas commented 1 year ago

If we renamed cmpr to compare, what would bxp be renamed to? binaryExpression? That's way too long and would dominate the expressions. I think we should either abbreviate all or none of them.

We need separate binary expression and a comparison expression. If we only had bxp, you couldn't use an unknown operator without specifying the output type explictly.

cmpr produces a boolean expression by default bxp produces an expression that has the same type as the operands by default

$ is already reserved for stuff that's not related to SQL ($call, $if, etc).

Another reason for the abbreviated names was that people won't find the methods by searching no matter what we name them. They should be discovered from examples. The API documentation is filled with examples (which people don't seem to find). The new site will have those examples in a more accessible/searchable place.

koskimas commented 1 year ago

We could replace bxp and cmpr with a single function if we accept that using custom operators needs an explicit output type. We support most of the operators the built-in dialects have, but third party dialects might suffer.

I kind of like the idea of replacing both with $ but unfortunately that's not an option as stated in the previous comment. What else could we use?

ethanresnick commented 1 year ago

What else could we use?

Hmm…I think the obvious candidate would be _, since that’s the only other legal non-alphabetic character to start an identifier with, but that feels pretty overloaded to me in backend JS land (with lodash and the like) and also a bit hard to type constantly. So maybe better would be a very short alphabetic string that gestures at ‘expression’…maybe just e?

qb.where(({ e, and }) => and([
   e('first_name', '=', 'Jennifer'),
   e('last_name', '!=', 'Tremblay'),
   e('updated_at', '>', lastMonth)
 ]))

db.updateTable('person')
  .set(({ e }) => ({  
     age: e('age', '+', 1)
  }))

Or possibly xp, which sorta sounds like expression?

qb.where(({ xp, and }) => and([
   xp('first_name', '=', 'Jennifer'),
   xp('last_name', '!=', 'Tremblay'),
   xp('updated_at', '>', lastMonth)
 ]))

db.updateTable('person')
  .set(({ xp }) => ({  
     age: xp('age', '+', 1)
  }))

Could also do just x, which also kinda sounds like expression and has the secondary meaning of ‘a generic placeholder name’, which kinda applies here.

qb.where(({ x, and }) => and([
   x('first_name', '=', 'Jennifer'),
   x('last_name', '!=', 'Tremblay'),
   x('updated_at', '>', lastMonth)
 ]))

db.updateTable('person')
  .set(({ x }) => ({  
     age: x('age', '+', 1)
  }))

ethanresnick commented 1 year ago

Another direction would be is, which I actually quite like…makes sense in the boolean context especially, but works decently in other contexts too:

qb.where(({ is, and }) => and([
   is('first_name', '=', 'Jennifer'),
   is('last_name', '!=', 'Tremblay'),
   is('updated_at', '>', lastMonth)
 ]))

db.updateTable('person')
  .set(({ is }) => ({  
     age: is('age', '+', 1)
  }))

I think that’s my favorite by far: it’s actually a full word, while still being shorter than the current names. If we don’t like is in the non-boolean contexts, we could also make is the replacement for cmpr and think of something different for bxp

ethanresnick commented 1 year ago

I suspect an objection to is will be that it could be confused with the SQL operator, and I agree that’s an issue.

So here’s one more option: get, which is also a full and appropriate word, and is short enough that, with a two argument overload, it might be able to replace unary and all the shorthand methods (not, exists, neg) as well.

qb.where(({ get, and }) => and([
   get('first_name', '=', 'Jennifer'),
   get('last_name', '!=', 'Tremblay'),
   get('updated_at', '>', lastMonth),
   get('not', 'is_deleted')
 ]))

db.updateTable('person')
  .set(({ get }) => ({  
     age: get('age', '+', 1),
     backwards: get('-', 'age')
  }))

Of course, we can mix and match between all these options (eg, unifying cmpr, bxp, and unary, but calling that unified method e or exp or whatever).

I’m also not sure about the compiler performance hit of unifying unary with the other methods via a two-argument overload, but it does seem nice from an API POV.

igalklebanov commented 1 year ago

Left field idea:

eb.b('ref', '+', 5) // `b` for binary.
eb.b01('ref', '=', 'moshe') // cmpr replacement. `b` for binary, `01` for SqlBool hint :)

ethanresnick commented 1 year ago

Hmm, I think I'd rather try to unify bxp and unary, in which case leaning into "binary" in the name doesn't make sense. But, if that unification doesn't work for some reason, then b seems like a decent option. I'm not a fan of the 01, though: I think kysely should infer the return type from the operator (at least for the built-in operators, which seem like they cover 95%+ of the usage).

igalklebanov commented 1 year ago

Another one:

eb('ref', '+', 5)
eb('ref', '=', 'moshe')

ethanresnick commented 1 year ago

Love that!

Although it does mean that every other property (and, or, fn) can't be destructured.... assuming eb is the whole builder object, and not a function called eb on the builder. But maybe we could pass the builder twice?

igalklebanov commented 1 year ago

@ethanresnick

Love that!

Although it does mean that every other property (and, or, fn) can't be destructured.... assuming eb is the whole builder object, and not a function called eb on the builder. But maybe we could pass the builder twice?

Yeah that could work.

interface EB {
  (): {}
  eb: Omit<EB, 'eb'>
  fn: () => {}
  ...
}

({ eb, fn }) => ...

ethanresnick commented 1 year ago

I'm loving this!

But why not:

interface EB {
  (): {}
  eb: EB // no omit here
  fn: () => {}
  ...
}

To me, it makes sense that the eb property should just be a reference to the full EB object (since that's what it's named after, after all).

Combining that with unifying bxp/unary/cmpr and we'd have:

qb.where(({ eb, and }) => and([
   eb('first_name', '=', 'Jennifer'),
   eb('last_name', '!=', 'Tremblay'),
   eb('updated_at', '>', lastMonth),
   eb('not', 'is_deleted')
 ]))

db.updateTable('person')
  .set(eb => ({  
     age: eb('age', '+', 1),
     backwards: eb('-', 'age')
  }))

koskimas commented 1 year ago

This seems awesome! But let's really think this through. Deprecating the functions we just added to replace another set of deprecated functions is a bit ugly 😅 Let's make sure this one sticks or we end up changing the API again.

ethanresnick commented 1 year ago

This seems awesome! But let's really think this through. Deprecating the functions we just added to replace another set of deprecated functions is a bit ugly 😅 Let's make sure this one sticks of we end up changing the API again.

Totally agree: if we make changes, we should be reasonably confident we're making all the changes we'd want to make, and doing more than just bikeshedding on function names. I've been thinking a bit more deeply about this today, and will have a proposal soon.

In the meantime, I'm curious: how are custom (dialect-specific) binary operators supposed to work today with bxp/cmpr? I see that the type of OP is allowed to extend Expression<unknown>, but a custom operator name (just the name, with no operands) isn't really be an Expression, right? I think it should just be a string?

Also, it seems like the return type of bxp in the custom operator case would always be ExtractTypeFromReferenceExpression<DB, TB, RE>, where RE is the expression on the lhs. But that doesn't seem quite right either? I.e., why are we assuming that the operator will output the same type as the first operand?

By my read, if we were gonna merge bxp and cmpr (just taking these two as a base case to start with), I think the type would use three overloads, and be something like:

// This is the exact type that `cmpr` has today, except it constrains OP to _only_ known
// comparison operators, rather than `ComparisonOperatorExpression`.
function bxp<
  O extends SqlBool = SqlBool,
  RE extends ReferenceExpression<DB, TB> = ReferenceExpression<DB, TB>
>(
    lhs: RE,
    op: ComparisonOperator,
    rhs: OperandValueExpressionOrList<DB, TB, RE>
  ): ExpressionWrapper<O>;

// This is the exact type that `bxp` has today, except that it also replaces 
// BinaryOperatorExpression with just BinaryOperator, and simplifies the return type
// because ComparisonOperator is now handled above.
function bxp<
  RE extends ReferenceExpression<DB, TB>,
>(
  lhs: RE,
  op: BinaryOperator,
  rhs: OperandValueExpression<DB, TB, RE>
): ExpressionWrapper<ExtractTypeFromReferenceExpression<DB, TB, RE>>

// Finally, we handle the custom operator case as just string, and let the caller
// customize the return type with a new RES parameter.
function bxp<
  RE extends ReferenceExpression<DB, TB>,
  RES extends unknown = ExtractTypeFromReferenceExpression<DB, TB, RE>
>(
  lhs: RE,
  op: string,
  rhs: OperandValueExpression<DB, TB, RE>
): ExpressionWrapper<RES>

Does that seem right to both of you? I just wanna check that I'm not misunderstanding something fundamental before I go down a longer rabit hole of thinking more about how we might unify some of these functions.

Btw, I’ve never made overloads with different type parameters before — didn’t even realize that was legal — so I’m not sure how well TS will resolve different call sites against overloads like this, or how the compiler performance compares with unifying all of them through some fancy conditional types. But, again, I just wanted to start by checking my conceptual understanding of how custom operators should work, before we get into these details

igalklebanov commented 1 year ago

@ethanresnick have you tried this in a playground? I suspect it might break autocompletion which is a big no-no.

ethanresnick commented 1 year ago

@igalklebanov I haven't, but I will. Again, though, I'm less interested in "should this be overloads or conditional types" and more interested in: am I understanding correctly that the current setup for custom operators — namely, making the operator be an Expression rather than a string, and forcing the output type to match the output type of the LHS — doesn't really make sense?

It also seems a little weird to be that bxp and cmpr seem to accept custom operators, but unary doesn't? I’m also not sure if there’s any use case for custom operators with more than two operands (probably not, as most of those will just be functions?), but that doesn’t seem supported either.

ethanresnick commented 1 year ago

Quick update: I did test the above set of overloads in the playground, and autocomplete works! However, if I also try to add overloads for what is currently the unary function, things get weird: the autocomplete dropdown shows all the valid options for each argument (ie, the available column names, operator names, etc), but it also shows some suggestions that are inapplicable (but that would be valid for other overloads), thanks to this bug: https://github.com/microsoft/TypeScript/issues/26892

That’s too bad, because I was hoping this eb function could unify unary, binary, and nary operators (including and and or) to give:

qb.where(eb => eb('and', [
  eb('first_name', '=', 'Jennifer'),
  eb('last_name', '!=', 'Tremblay'),
  eb('updated_at', '>', lastMonth),
  eb('not', 'is_deleted')
]))

But it doesn’t seem like that’s currently possible while still having great autocomplete.

The rule would’ve been:

eb with three arguments is a binary expression, with the operator infixed
eb with two arguments, where the second is an expression, is a unary operator expression (with the operator in the first argument)
eb with two arguments, where the second is an array, is an nary operator expression (with the operator in the first argument)

koskimas commented 1 year ago

Let's not try to combine and, or or unary into the new function. They'd break autocompletion no matter how they are implemented.

ethanresnick commented 1 year ago

Ok. In that case, I think my only outstanding questions are the ones I posed above about custom operators: how are they passed, and are they supported only for binary operators?

koskimas commented 1 year ago

@ethanresnick I'm working on these changes here https://github.com/kysely-org/kysely/pull/565

Thanks again for the discussion here! I think the next versions is a big improvement.

koskimas commented 1 year ago

Ok. In that case, I think my only outstanding questions are the ones I posed above about custom operators: how are they passed, and are they supported only for binary operators?

We already support most operators. I think it's fine to require

sql`custom_op`

in the rare cases its needed. We can also add operators as we go.

ethanresnick commented 1 year ago

Wow, thanks for working on this @koskimas! It's awesome to see it landed 😁 🚀

As far as custom operators go, I don't use any custom operators myself, so I don't have a strong opinion, but using the sql tag does honestly seem a bit weird to me, simply because the sql tag returns an expression, and an operator is not an expression. So, I'd be inclined to accept the custom operator as a string, which doesn't need to break autocomplete. Then again, I can see some benefits to the template tag approach:

it avoids the (very remote) possibility of a built-in kysely operator conflicting with a custom operator that has the same name but different argument or return types
it means that, if the user makes a typo on one of the built-in operator names, TS will catch it (though, with autocomplete, such typos should be fairly rare).

So, I guess I could go either way. I do think though, that the inferred return type for expressions with custom operators seems a bit off, as I mentioned in the comment linked above.

kysely-org / kysely

Expression Builder API feedback #494