sqlc-dev / sqlc

Generate type-safe code from SQL
https://sqlc.dev
MIT License
13.54k stars 809 forks source link

Allow the specification of shared row datatypes #3595

Open pconstantinou opened 2 months ago

pconstantinou commented 2 months ago

What do you want to change?

Background

sqlc currently allows queries to generate a struct based on all the columns of a table using the table name.* syntax. This is useful but only satisfies a subset of use cases.

Problem

Many application scenarios use various formats for from and where clauses with the same columns in the result. For data-intensive applications where a large number of rows are returned, retrieving a subset of the columns is not just desirable but necessary.

A common use case is where the underlying table has a UUID-based user_id and the application wants varying subsets of the user's data. The table has a few other small columns where the application retrieves thousands of rows as a response. The majority of the data (in bytes) returned will be the user_id repeated over and over again unnecessarily.

Further, the use cases where a variety of queries that all return the same row structure are common.

sqlc currently creates a unique Row struct whenever specific column names are given.

Workarounds and Issues with Go

I've experimented with extensions with limited success. Extending the structures generated from sqlc by implementing getters and a common interface does little to address the problem. While individual row structures can have a common interface. The slice of rows returned can't be cast to a slice of its interface. Reallocating the whole slice generates a large amount of garbage and allocations.

I'd hoped that the new iterator would provide a good workaround, but benchmarking shows that iterators returning interfaces to structures also generate a new memory allocation for each row record when the struct is cast on an interface.

An unattractive solution is to use the new iterator and return a new struct copying each field.

Proposal (very open to other options)

In the query.sql file, introduce a new keyword that functions like --name:. This could be --row: which would provide the name of the common struct. All queries with the same row: definition would use the same struct. The compiler would generate an error if the struct defined by the row was not compatible (ie. contained the same columns with the same types).

sqlc's strength is the ability to identify query and data structure errors at compile time. That spirit should continue, so "compatibility" should be interpreted strictly. Returning a subset of the row's columns should not be considered compatible.

I recommend allowing the same row struct to be used on different tables provided that the data structures are compatible. This will allow a semblance of data polymorphism.

This approach would not allow embedding row structs in other structs but would allow the existing table embedding.

What database engines need to be changed?

No response

What programming language backends need to be changed?

Go