sqlc-dev / sqlc

Generate type-safe code from SQL
https://sqlc.dev
MIT License
13.64k stars 813 forks source link

Determine which indices are needed for efficient execution of a query #3343

Open mitar opened 7 months ago

mitar commented 7 months ago

What do you want to change?

Currently using SQL databases generally operate in the mode that every query is possible and they create an execution plan at runtime, based on data statistics and available indices. You have prepared statement you can use, but they are generally still done per-connection.

But in many cases, the set of queries one uses in apps is limited. sqlc is realizing that as well and takes an advantage to build client-side type-safe code to interface with SQL. In a way it compiles SQL into target code (e.g., Go) and abstracts SQL away.

I would like to suggest that sqlc could go a step further. During compilation time it could use data statistics to a) optimize the query itself b) create a SQL migration to add/remove indices to effectively execute those queries.

I know that doing this would be huge effort, but I think it is also very powerful. In this way I would bundle with a version of my app not just (compiled and optimized) queries, but also the set of indices (and data schema as well) those queries need for performant execution.

One approach to do this would be that during compilation time the sqlc would connect to a database with data, install hypothetical indices, and EXPLAIN all queries available (maybe a developer should provide test parameters), determine which indices are needed, and then create also a migration to install (or optionally remove) indices.

What database engines need to be changed?

PostgreSQL

What programming language backends need to be changed?

Go

mitar commented 7 months ago

There seems to exist something similar: https://github.com/ankane/dexter But it is not built-in into a compilation step.

lyda commented 7 months ago

Can I suggest a slightly different process?

How about generating a second function for each SELECT that does an EXPLAIN? So for example:

-- name: GetAuthor :one
SELECT * FROM authors
WHERE id = $1 LIMIT 1;

would generate a GetAuthor function but also an ExplainGetAuthor that would return explain results for that database. You could then add a test program that would run all those Explain statements and give results in your build process - whenever that was appropriate. I can imagine that folks won't always have DB access at compile time but a build pipeline would/could.

In addition this would allow people to track changes over time, decide which queries needed to be optimal, etc.

mitar commented 7 months ago

@lyda An interesting proposal. In a way it makes sense, it is like the benchmarks you would have in a test suite where tests would fail if expected performance would get too low.

It would mean that developers would have to manually add indices and then during testing you could check:

This could also be something similar to "coverage" - do all your tests at the end cover the indices you have and are there indices they could over but are missing. So you would run whole test suite and check that (instead of having an explicit tests to validate indices - I worry that then calls you might be doing would be repeated with calls you do during regular tests).

Which brings me then to the final point:

This might best live outside of a particular program/test suite/sqlc. You have test cases to test your code (which can include running against a realistic dump of data). Then you:

Sqlc would then only have to support some hook to be able to give you information about the schema to create all the hypothetical queries. But one could probably find a way to parse schema from sql also without sqlc to figure that out.

lyda commented 7 months ago

Exactly. sqlc will provide the Explain* functions but then it's up to you to run one or more of them in whatever environment you see fit. I think the real win here is if we can parse the response to EXPLAIN and provide meaningful feedback to the developer.

I might try and do it for cockroachdb if I get some time. Bank holiday coming up in a few weeks.

lyda commented 7 months ago

Note: This could even be run in production systems to track how certain statements are being optimised over time.