Given n rows processed in m batches, currently 3n statements are sent to the DB for updates: BEGIN, UPDATE, COMMIT. If the updates were wrapped in a transaction, then it would only send n + 2m updates.
On a local postgres table with 40000 rows, batch size 1000, anonymizing a single email field.
Before changes: 2m 52s
With transactions: 2m 26s (15% faster)
I'm not sure if this would have undesired effects for others, so maybe this should be configurable?
Coverage decreased (-2.3%) to 91.541% when pulling d4f1c305fd747a8125e0b050a9cc5ced9947ec40 on kickbooster:transactions into db4f509dd9448fb2cfd25e4bb15c3d9116daead0 on sunitparekh:master.
Given
n
rows processed inm
batches, currently3n
statements are sent to the DB for updates:BEGIN
,UPDATE
,COMMIT
. If the updates were wrapped in a transaction, then it would only sendn + 2m
updates.On a local postgres table with 40000 rows, batch size 1000, anonymizing a single email field.
Before changes: 2m 52s With transactions: 2m 26s (15% faster)
I'm not sure if this would have undesired effects for others, so maybe this should be configurable?