zio / zio-quill

Compile-time Language Integrated Queries for Scala
https://zio.dev/zio-quill
Apache License 2.0
2.15k stars 348 forks source link

Quill Streaming for inserting data (i.e. COPY stdin with Postgres)? #2195

Open mdedetrich opened 3 years ago

mdedetrich commented 3 years ago

So I have a usecase at work where we need to input a huge amount of data into a Postgres dataset (talking about millions of rows here, if not more), perform a SELECT query and them stream the data back out

When checking the Quill docs, I firstly noticed that streaming only applies when SELECT which isnt a problem and hence I am thinking that I just needed to implement the inserting of data via a stream.

It turns out that Postgres has its own streaming support that is done via the COPY command, see

Typically this is used to stream data directly from files however postgres also lets you specify STDIN/STDOUT as an input/output. JDBC implements this as a CopyManager which allows various types of InptStream/OutputStream/Reader.

Ideally at least when working with databases that support direct streaming (i.e. JDBC+Postgres) we should be using this to do our insert streaming since it actually means the output (for inserts) also does streaming.

mdedetrich commented 3 years ago

Okay so I did a bit more research, basically this is going to be a bit of work to implement mainly because of the reason that COPY is not a proper prepared statement its just raw SQL. This is only because COPY works on a single table with basic filtering (so prepared statements are pointless). This is also the same reason why COPY is so fast, it basically bypass entire parts of the Postgres engine that is not necessary.

After going through the code, it seems Quill doesn't natively have the ability to create SQL statements with input parameters that don't use prepared statements (basically Quill is built completely around prepared statements, there also appears to be an issue on this (see https://github.com/getquill/quill/issues/205)).

Ontop of this, COPY is quite limited in what it can do, i.e. you can only COPY from a single table and it seems that only basic WHERE filtering is allowed. This would mean having to adjust the macro AST so that one can only create valid COPY statements. This means having to do changes to the PostgresDialect to create a completely new type of statement (i.e. copy[MyTable] instead of query[MyTable]?)

What are your thoughts @deusaquilus ? I am thinking maybe this makes sense for the next iteration of Quill? (unless there is an example somewhere were we use don't use prepared statements at all).