netzkolchose / django-fast-update

Faster db updates using UPDATE FROM VALUES sql variants.
MIT License
20 stars 2 forks source link

Make compatible with psycopg 3. #18

Open hwalinga opened 1 year ago

hwalinga commented 1 year ago

I made the code compatible with psycopg 3. The code now works on psycopg2 and psycopg 3.

I made a compat_copy_from as connection.copy_from is not available in psycopg 3 anymore and used that in the copy_from function.

I made a function to get the connection's encoding on both platforms.

The Range types are now all just Range, so made a compatibility block that changes all different types of Range to just Range on psycopg 3.

In the tests

The binary fields just return bytes now, so there is a compatible function tobytes that just returns bytes on both platforms.

The empty nested array is not anymore reduced on psycopg 3, so those tests are now skipped for psycopg 3.

Performance

I am unsure if this is as performant as possible on psycopg 3, but at least it is now compatible with both platforms.

jerch commented 1 year ago

Oh wow, thx for looking into this :smile_cat: . Will check it out later.

jerch commented 1 year ago

@hwalinga I looked through your changes - looks really good to me.

Still I am hesitating to add this for the following reason - in my early tests about different insert/update tricks I found the new copy.write_row() to outperform my manual value formatting done in python by 2-5 times*. And my value formatting only comes close to that speed by a really bad trick - it omits explicit formatting for types, whose string repr I found to be understood by postgres' TEXT format, so it is def. less reliable than the default formatters provided by psycopg3. Thus I think the whole value formatting as currently done is on stake for psycopg3.

For reference also see https://gist.github.com/jerch/fd0fae0107ce7b153b7540111b2e89ab?permalink_comment_id=4097804#gistcomment-4097804.

TL;DR: With copy.write_row() we get saver value formatting and a speedup of 2-5 times*, depending on data types in the records.


[*] insert time on the interim temp table

jerch commented 8 months ago

Did some preliminary tests with an early psycopg3 rewrite with write_row() (values are the time in seconds to update 10k instances):

10000 instances
bulk_update: 69.83318078517914
fast_update: 2.205981731414795
copy_update: 0.37508416175842285

Thats ~186x faster than bulk_update and ~6x faster than fast_update, while the psycopg2 variant is only ~2.5x faster than fast_update. So by using write_row we gain another perf bonus of ~3x, plus its implicit types checks. Def. the way to go.

Edit: The binary format gives another 5-10% perf bonus in my first tests. So we should get BINARY working, maybe with an option to resort to TEXT later on (in case someone has created a complex field w'o a suitable adapter).

jerch commented 7 months ago

More numbers on the psycopg3 rewrite testing different settings like BINARY/TEXT and using django methods to prepare values.

Trying to utilize django's get_db_prep_save method on fields for value adaption is quite expensive:

Perfwise the best combination is relying on psycopg3's auto adaption (not using get_db_prep_save) with the BINARY protocol. Thats a bit unfortunate, as get_db_prep_save is django's official way to mess with field values on db level, which we also should use here (Note that copy_update doesnt use it either, as the encoders there are meant as a faster replacement for value translation into TEXT format).

In general it is a good thing to be more in line with django default features, so we prolly should use get_db_prep_save by default. Since this lib is about "as fast as possible" we could introduce two more arguments to deal with that:

Edit: More perf squeezing - with also skipping the sanity checks BINARY+_skip_valuepreparation ends up being really fast:

copy3_update: 0.22316670417785645  <-- now almost 10x faster than fast_update

Maybe _skip_valuepreparation should also skip the sanity checks, as the values must be in line anyway...