haxetink / tink_sql

SQL embedded into Haxe
MIT License
53 stars 16 forks source link

Subqueries #63

Open benmerckx opened 6 years ago

benmerckx commented 6 years ago

Subqueries have multiple uses. As far as I can tell they can appear in these scenarios:

As a source to select from

Only considering these in SELECT for now, but in theory they can be used for DELETE, UPDATE and INSERT as well. These must be aliased.

db.from({myalias: db.Table.where(...)})

This should not be very hard to add. We probably need a macro to make the table alias part of the Fields. It might be possible to do it macro free if we go for something like from(source: Dataset, alias: String) but I think the api looks cleaner. It also feels like a better api to hande table aliases too (instead of db.Table.as('alias'):

db.Table.join({alias: db.Table}).on(...)

If we're directly pointing to a TableSource we can simply alias the table, if it's a query (Dataset) we use a subquery in the formatter. My suggestion:

As an expression

It's important to distinguish what data or columns we're trying to extract from the subquery. Because Dataset.select is a macro it's rather easy to determine this information but I had some trouble trying to find a good fit for Haxe's type system.

Subqueries can be used in expressions in these ways:

There's more which I think we should ignore for now (haven't looked up if these are even available for Sqlite): Row subqueries, Correlated subqueries

Ideally the api could look like this:

db.Table.where(field == db.Table.select({field: Table.field}))
db.Table.where(field.in(db.Table.select({field: Table.field})))

Since there's a difference between selecting from a single row or multiple rows we can require it to be of Dataset.Limitable and force a limit(1) if we're using it as a scalar. Any Dataset (selecting from a single column) will do for selecting from multiple rows.

But I've had two problems trying to finish the implementation:

  1. Keeping track of what we're selecting. In the select macro we need to keep information about what type the column we're selecting from is (only if we're selecting from a single column). I tried the following which works, but it doesn't really feel right so I appreciate any suggestions to do a better job: I added a typedef SingleField<T, Fields> = Fields where T holds the datatype. This propagates through where/groupBy/limit etc because Fields is always passed down.
  2. Trying to fit it into Expr so we can use the subqueries in where. This is hard because it requires every Dataset to be wrapped in an abstract. I also had a lot of trouble trying to get the operator overloading working only on Dataset<SingleField<T, _>, ...>. A solution might be adding Dataset.subquery() or something similar which will cast it. Additionally it's near impossible to check if our subquery is actually of the same Database. Not sure if we want to go the distance to typecheck that as it would require to keep the Database as a type parameter in Expr. One more issue pops up if we want to add #14: we'd have trouble comparing these expressions as they can be of different tables.

I'd appreciate some input because I might be looking at it wrong. On the plus side the actual formatting is really easy so I think it's all that's keeping me from finishing this.

back2dos commented 6 years ago

Well, this will be fun. Sorry, still have to think about how we can actually pull it off ^^

kevinresol commented 6 years ago

Any chance this will step forward a bit?

benmerckx commented 6 years ago

If my suggestions for the first item look good (db.from({myalias: db.Table...})) I can work it in. Having subqueries as expressions is where I would definitely need some suggestions first :)

kevinresol commented 6 years ago

Object field name as alias is what I expected too. Welcome that.

But I don't really have idea for the second one.

I wonder if the upcoming (if ever) Haxe Inline Markup (tm) could be a solution of the craziness in the typing process?