oyvindberg / typo

Typed Postgresql integration for Scala. Hopes to avoid typos
https://oyvindberg.github.io/typo/
MIT License
101 stars 11 forks source link

Add example of how generated insertMany could look like for doobie #64

Closed sbrunk closed 11 months ago

sbrunk commented 1 year ago

Currently failing for UsersRowUnsaved as we need a way to insert defaults for batch inserts.

The issues is that a list of UsersRowUnsaved could contain both real values and defaulted ones. What we want is probably s.th. like asked in https://github.com/tpolecat/doobie/issues/1219

oyvindberg commented 1 year ago

Thanks!

I have a suspicion that it's impossible to implement the batched function for unsaved rows because of the default parameters. Need to research it a bit more, I guess it comes down to postgres syntax in the end.

If it's not possible we can say that it's just supported for tables without default parameters. Or we can generate a new row type without the default columns. Or we can throw exceptions if you provide a value and assume you don't specify. Or something like that.

Suggestions very welcome here

sbrunk commented 1 year ago

Yeah from what I found (1, 2) and tried it looks like it's not possible with JDBC to specify "use default here" for individual rows in prepared statements (which is what doobie's updateMany uses). You can only do it for the whole batch by removing the ? for that column in the insert, but then we'd need a new row type or throw exceptions for provided values like you've suggested.

The only thing I found where this works is using multirow inserts like this:

insert into users(user_id, name, last_name, email, password, created_at, verified_on)
VALUES (uuid_generate_v4(), 'Alice', 'Wonder', 'email1@asd.no', 'password', now(), now()),
       (uuid_generate_v4(), 'Bobby', 'Tables', 'email2@asd.no', 'password', default, now())

But then since the statement is dynamic (depends on the input length), we can't use the interpolator and I'm not sure how to make this safe against SQL injection.

Interestingly, there's an option of the Postgres JDBC driver that rewrites batch inserts (only without returning though) into multirow inserts, indicating that mulltirow inserts are faster:

From https://jdbc.postgresql.org/documentation/use/ and this SO post.

reWriteBatchedInserts (boolean) Default false This will change batch inserts from insert into foo (col1, col2, col3) values (1, 2, 3) into insert into foo (col1, col2, col3) values (1, 2, 3), (4, 5, 6) this provides 2-3x performance improvement

That all being said, it seems COPY does support default values for individual fields. Since it is also faster, perhaps it's indeed better not to spend too much effort with this approach here...

oyvindberg commented 12 months ago

maybe COPY is the path of least resistance in the end. It's a bit of work because we need to come up with a valid string representation of all supported data types, but apart from that I already have half an implementation in a branch somewhere.

I think the first course of action is to write property based tests for doobie, where we generate all kinds of crazy strings, arrays of strings, structs of arrays of strings and check that the CSV encoding is correct. then we copy/paste it into implementations for the other database libraries, then the typo-part of it is easy.

oyvindberg commented 12 months ago

That all being said, it seems COPY does support default values for individual fields

did you confirm this? is there a link? I haven't looked into it at all yet

sbrunk commented 12 months ago

That all being said, it seems COPY does support default values for individual fields

did you confirm this? is there a link? I haven't looked into it at all yet

Not confirmed yet, but here's the interesting bit in the docs: https://www.postgresql.org/docs/current/sql-copy.html

DEFAULT Specifies the string that represents a default value. Each time the string is found in the input file, the default value of the corresponding column will be used. This option is allowed only in COPY FROM, and only when not using binary format.

You can also do set the default for all rows for a column:

If a column list is specified, COPY TO copies only the data in the specified columns to the file. For COPY FROM, each field in the file is inserted, in order, into the specified column. Table columns not specified in the COPY FROM column list will receive their default values.

But the first one would be of course more interesting for unsaved values.

sbrunk commented 12 months ago

I've just pushed the changes that would have to be generated for doing a bulk insert into UserRepo via COPY. The Text instances will be more involved for other types for sure, but here it seems to be quite simple.

sbrunk commented 12 months ago

Exciting news: I think I got it working with defaults too for UsersRowUnsaved!

Important caveat though: I realized that DEFAULT in COPY FROM is only available from Postgres 16 so it's not going to work in older versions. See https://www.postgresql.org/docs/16/release-16.html#RELEASE-16-UTILITY

Going to push that soon so you can have a look.

oyvindberg commented 12 months ago

Very cool, and great that they fixed it in PG. I guess this leaves us with two paths forward :)

I also pushed my (very WIP) work towards COPY API. it lacks implementation of the insertStreaming method, valid Text instances for all custom types (though I started, look at TypoLine for instance), and testing

sbrunk commented 12 months ago

Awesome!

I've just pushed a working COPY example with defaults for Postgres >= 16. UsersRepoTest should pass if you build an image from the new docker file and run it as container.

There's another test failing now but I think the failure is related to the newer Postgres version, not sure:

📗 BSP: ReadPostgresTest:
📗 BSP: - don't fail while reading rows from postgres tables *** FAILED ***
📗 BSP:   doobie.util.invariant$NonNullableColumnRead: SQL `NULL` read at column 8 (JDBC type Char) but mapping is to a non-Option type; use Option here. Note that JDBC column indexing is 1-based.
oyvindberg commented 12 months ago

Wonderful, that's exactly what's needed!

I think it's OK if bulkInsertUnsaved only works for PG16+, since the other one will work for all PG versions. We can document that in a comment and in the docs.

It's also very easy to copy/paste these things and tweak in your own codebase if you're on old PG and need that specific piece of functionality

oyvindberg commented 11 months ago

these will be merged as part of #65