Closed jerch closed 2 years ago
Wow, this optimized update query scales much better with a bigger batchsize than bulk_update
. Numbers for COMPUTEDFIELDS_BATCHSIZE = 1000
on 10000 records:
# sqlite - big benefit
>>> timer(lambda : update_dependent(SelfRef.objects.all()))
0.6897451877593994
# postgres - small benefit
>>> timer(lambda : update_dependent(SelfRef.objects.all()))
1.6358757019042969
# mariadb - big benefit
>>> timer(lambda : update_dependent(SelfRef.objects.all()))
0.7499890327453613
Edit:
postgres actually performs better with a single big statement than with using execute_values
. Here the corrected numbers with the same query construction logic as done for the others for batchsize 1000:
# postgres - now a big benefit too
>>> timer(lambda : update_dependent(SelfRef.objects.all()))
0.6264104843139648
(Phew - postgres is ahead again :smile_cat:)
To summarize things for our particular perf test case:
UPDATE FROM VALUES
gives a speedup of ~25x compared to bulk_update
, leading to times of <1s for 10000 records of SelfRef
. Compared to the old loop-saving, this is ~100 times faster.SelfRef
(all model local fields), which will change significantly with more complicated cf dependencies.UPDATE FROM VALUES
seems to be a neat trick to get multiple columns and rows updated very fast in one go without the need to resort to a temp table in between. There is still a chance, that for very big updates (>100k records at once) a temp table might perform better, but the complexity of this seem not worth it.
Currently the main issue with this optimization is the support in db backends:
VALUES()
recently, which is really cumbersome to test and maintain. Maybe a django manangement command can help to check the db compatibility.Changes Missing Coverage | Covered Lines | Changed/Added Lines | % | ||
---|---|---|---|---|---|
computedfields/fast_update.py | 118 | 136 | 86.76% | ||
<!-- | Total: | 132 | 150 | 88.0% | --> |
Totals | |
---|---|
Change from base Build 1715609233: | -1.2% |
Covered Lines: | 1338 |
Relevant Lines: | 1389 |
Left to do:
Playground to eval speed differences between certain update tricks.
UPDATE FROM
gives a rather big speedup in postgres and sqlite >= 3.33 (testcase is to update 10000 records ofSelfRef
):Edit: With mariadb >= 10.3 UPDATE FROM can be simulated without creating a temp table like this:
Update - runtime for mysql: