Open hwalinga opened 1 year ago
Oh wow, thx for looking into this :smile_cat: . Will check it out later.
@hwalinga I looked through your changes - looks really good to me.
Still I am hesitating to add this for the following reason - in my early tests about different insert/update tricks I found the new copy.write_row()
to outperform my manual value formatting done in python by 2-5 times*. And my value formatting only comes close to that speed by a really bad trick - it omits explicit formatting for types, whose string repr I found to be understood by postgres' TEXT format, so it is def. less reliable than the default formatters provided by psycopg3. Thus I think the whole value formatting as currently done is on stake for psycopg3.
For reference also see https://gist.github.com/jerch/fd0fae0107ce7b153b7540111b2e89ab?permalink_comment_id=4097804#gistcomment-4097804.
TL;DR: With copy.write_row()
we get saver value formatting and a speedup of 2-5 times*, depending on data types in the records.
[*] insert time on the interim temp table
Did some preliminary tests with an early psycopg3 rewrite with write_row()
(values are the time in seconds to update 10k instances):
10000 instances
bulk_update: 69.83318078517914
fast_update: 2.205981731414795
copy_update: 0.37508416175842285
Thats ~186x faster than bulk_update
and ~6x faster than fast_update
, while the psycopg2 variant is only ~2.5x faster than fast_update
. So by using write_row
we gain another perf bonus of ~3x, plus its implicit types checks. Def. the way to go.
Edit: The binary format gives another 5-10% perf bonus in my first tests. So we should get BINARY
working, maybe with an option to resort to TEXT
later on (in case someone has created a complex field w'o a suitable adapter).
More numbers on the psycopg3 rewrite testing different settings like BINARY/TEXT and using django methods to prepare values.
Trying to utilize django's get_db_prep_save
method on fields for value adaption is quite expensive:
with get_db_prep_save
:
# format: TEXT
10000 instances
fast_update: 2.045956015586853
copy_update: 0.48638129234313965
copy3_update: 0.7911456823348999
# format:BINARY
10000 instances
fast_update: 2.0687363147735596
copy_update: 0.5054255723953247
copy3_update: 0.6903678178787231
w'o get_db_prep_save
:
# format: TEXT
10000 instances
fast_update: 2.0186638832092285
copy_update: 0.5016790628433228
copy3_update: 0.3683037757873535
# format: BINARY
10000 instances
fast_update: 2.128391742706299
copy_update: 0.5029968023300171
copy3_update: 0.2735722064971924 <-- almost 8x faster than fast_update!
(Note that I had to disable some fields on the model for the update for runtime comparison, as json and inet values must be decorated by their psycopg3 types prehand, which get_db_prep_save
does. With these values decorated manually on src data, the difference between copy_update
and copy3_update
gets even bigger)
Perfwise the best combination is relying on psycopg3's auto adaption (not using get_db_prep_save
) with the BINARY
protocol. Thats a bit unfortunate, as get_db_prep_save
is django's official way to mess with field values on db level, which we also should use here (Note that copy_update
doesnt use it either, as the encoders there are meant as a faster replacement for value translation into TEXT
format).
In general it is a good thing to be more in line with django default features, so we prolly should use get_db_prep_save
by default. Since this lib is about "as fast as possible" we could introduce two more arguments to deal with that:
Jsonb(value)
for a jsonb field), should default to False
Edit: More perf squeezing - with also skipping the sanity checks BINARY+_skip_valuepreparation ends up being really fast:
copy3_update: 0.22316670417785645 <-- now almost 10x faster than fast_update
Maybe _skip_valuepreparation should also skip the sanity checks, as the values must be in line anyway...
I made the code compatible with psycopg 3. The code now works on psycopg2 and psycopg 3.
I made a
compat_copy_from
asconnection.copy_from
is not available in psycopg 3 anymore and used that in thecopy_from
function.I made a function to get the connection's encoding on both platforms.
The Range types are now all just
Range
, so made a compatibility block that changes all different types of Range to just Range on psycopg 3.In the tests
The binary fields just return bytes now, so there is a compatible function tobytes that just returns bytes on both platforms.
The empty nested array is not anymore reduced on psycopg 3, so those tests are now skipped for psycopg 3.
Performance
I am unsure if this is as performant as possible on psycopg 3, but at least it is now compatible with both platforms.