valderman / selda

A type-safe, high-level SQL library for Haskell
https://selda.link
MIT License
478 stars 58 forks source link

Selda creates convoluted SQL strings #102

Closed kuribas closed 3 years ago

kuribas commented 5 years ago

Hi, this simple query generates a three levels deep nested select query:

meQuery :: Query s (Row s Test)
meQuery = do
  t <- select testtable
  restrict $ t ! #name .== "MyName"
  pure t

generated query:

SELECT "name_0", "birthday_1", "married_2", "si_3", "mi_4", "li_5" FROM (SELECT "name_0", "birthday_1", "married_2", "si_3", "mi_4", "li_5" FROM (SELECT "name" AS "name_0", "birthday" AS "birthday_1", "married" AS "married_2", "si" AS "si_3", "mi" AS "mi_4", "li" AS "li_5" FROM "test") AS q0 WHERE ("name_0" = ?)) AS q1

That doesn't seem very efficient

valderman commented 5 years ago

Indeed, there's a lot of low-hanging fruit here, since Selda doesn't do any optimisation of the generated query at all. The rationale for this is that a decent query planner should be able to optimize queries like this fairly well, and that hammering out the API has taken precedence.

This particular example is probably as bad as it gets: consecutive selects would get merged, as would consecutive restricts, and the outermost query (which is only there to put the selected columns in order) is pretty pointless for such a simple query.

Either way, adding an optimization pass would definitely make queries - especially smallish ones - a lot more readable, and may yield better performance for more complex queries, so we should definitely have one.

kuribas commented 5 years ago

Right, this may be optimized away by the planner, and it would need to be benchmarked to see if this is a problem. However wouldn't adding an optimizer increase the time for generating the Query String? Perhaps it could be interesting to see if it can be compiled to a static string, even with different parameters, then the overhead of generating the string would be done only once.

kuribas commented 5 years ago

If it is possible to write the query implementation such that ghc can see the query string doesn't depend on the value of the parameters (which it doesn't), then you could write query functions like this:

myQuery = \p1 p2 -> query_body 

The optimiser would then float the query string up, and it would only be generated once. Then even a relatively expensive optimizing step wouldn't have any impact.