Closed sbrunk closed 11 months ago
Thanks!
I have a suspicion that it's impossible to implement the batched function for unsaved rows because of the default parameters. Need to research it a bit more, I guess it comes down to postgres syntax in the end.
If it's not possible we can say that it's just supported for tables without default parameters. Or we can generate a new row type without the default columns. Or we can throw exceptions if you provide a value and assume you don't specify. Or something like that.
Suggestions very welcome here
Yeah from what I found (1, 2) and tried it looks like it's not possible with JDBC to specify "use default here" for individual rows in prepared statements (which is what doobie's updateMany
uses). You can only do it for the whole batch by removing the ?
for that column in the insert, but then we'd need a new row type or throw exceptions for provided values like you've suggested.
The only thing I found where this works is using multirow inserts like this:
insert into users(user_id, name, last_name, email, password, created_at, verified_on)
VALUES (uuid_generate_v4(), 'Alice', 'Wonder', 'email1@asd.no', 'password', now(), now()),
(uuid_generate_v4(), 'Bobby', 'Tables', 'email2@asd.no', 'password', default, now())
But then since the statement is dynamic (depends on the input length), we can't use the interpolator and I'm not sure how to make this safe against SQL injection.
Interestingly, there's an option of the Postgres JDBC driver that rewrites batch inserts (only without returning
though) into multirow inserts, indicating that mulltirow inserts are faster:
From https://jdbc.postgresql.org/documentation/use/ and this SO post.
reWriteBatchedInserts (boolean) Default false This will change batch inserts from insert into foo (col1, col2, col3) values (1, 2, 3) into insert into foo (col1, col2, col3) values (1, 2, 3), (4, 5, 6) this provides 2-3x performance improvement
That all being said, it seems COPY does support default values for individual fields. Since it is also faster, perhaps it's indeed better not to spend too much effort with this approach here...
maybe COPY is the path of least resistance in the end. It's a bit of work because we need to come up with a valid string representation of all supported data types, but apart from that I already have half an implementation in a branch somewhere.
I think the first course of action is to write property based tests for doobie, where we generate all kinds of crazy strings, arrays of strings, structs of arrays of strings and check that the CSV encoding is correct. then we copy/paste it into implementations for the other database libraries, then the typo-part of it is easy.
That all being said, it seems COPY does support default values for individual fields
did you confirm this? is there a link? I haven't looked into it at all yet
That all being said, it seems COPY does support default values for individual fields
did you confirm this? is there a link? I haven't looked into it at all yet
Not confirmed yet, but here's the interesting bit in the docs: https://www.postgresql.org/docs/current/sql-copy.html
DEFAULT Specifies the string that represents a default value. Each time the string is found in the input file, the default value of the corresponding column will be used. This option is allowed only in COPY FROM, and only when not using binary format.
You can also do set the default for all rows for a column:
If a column list is specified, COPY TO copies only the data in the specified columns to the file. For COPY FROM, each field in the file is inserted, in order, into the specified column. Table columns not specified in the COPY FROM column list will receive their default values.
But the first one would be of course more interesting for unsaved values.
I've just pushed the changes that would have to be generated for doing a bulk insert into UserRepo
via COPY
. The Text
instances will be more involved for other types for sure, but here it seems to be quite simple.
Exciting news: I think I got it working with defaults too for UsersRowUnsaved
!
Important caveat though: I realized that DEFAULT
in COPY FROM
is only available from Postgres 16 so it's not going to work in older versions. See https://www.postgresql.org/docs/16/release-16.html#RELEASE-16-UTILITY
Going to push that soon so you can have a look.
Very cool, and great that they fixed it in PG. I guess this leaves us with two paths forward :)
I also pushed my (very WIP) work towards COPY API. it lacks implementation of the insertStreaming
method, valid Text
instances for all custom types (though I started, look at TypoLine
for instance), and testing
Awesome!
I've just pushed a working COPY example with defaults for Postgres >= 16. UsersRepoTest
should pass if you build an image from the new docker file and run it as container.
There's another test failing now but I think the failure is related to the newer Postgres version, not sure:
📗 BSP: ReadPostgresTest:
📗 BSP: - don't fail while reading rows from postgres tables *** FAILED ***
📗 BSP: doobie.util.invariant$NonNullableColumnRead: SQL `NULL` read at column 8 (JDBC type Char) but mapping is to a non-Option type; use Option here. Note that JDBC column indexing is 1-based.
Wonderful, that's exactly what's needed!
I think it's OK if bulkInsertUnsaved
only works for PG16+, since the other one will work for all PG versions. We can document that in a comment and in the docs.
It's also very easy to copy/paste these things and tweak in your own codebase if you're on old PG and need that specific piece of functionality
these will be merged as part of #65
Currently failing for UsersRowUnsaved as we need a way to insert defaults for batch inserts.
The issues is that a list of
UsersRowUnsaved
could contain both real values and defaulted ones. What we want is probably s.th. like asked in https://github.com/tpolecat/doobie/issues/1219