inconsistency in duplicate skipping

netzkolchose / django-fast-update

Faster db updates using UPDATE FROM VALUES sql variants.

MIT License

20 stars 2 forks source link

The current duplicate skipping is not stable at the batch border creating batch offsets in following data with possible follow-up skip inconsistencies:

pks: [1,2,3,3,2,2,1,1,4], batch_size = 4
fast_update creates: [1,2,3,x,x,x,x,x,4] --> [[1,2,3,4]]
bulk_update creates: [[1,2,3,x],[2,x,1,x],[4]] --> [[1,2,3], [2,1], [4]]

This is caused by prebatching as done by bulk_update vs. aggregated batching in fast_update, where the overall updates differ in the end.

NB: This makes me wonder, if the original behavior is wanted at all - the fact that a second update gets through just because it ended up in a different batch, looks like a surprising side effect, esp. as batch_size is just meant to have some control over the query load. Wouldn't it be better to treat a single bulk_update call as atomic from user perspective, thus either filter all duplicates from the whole changeset, or just disallow duplicates at all? --> https://code.djangoproject.com/ticket/33672

netzkolchose / django-fast-update

inconsistency in duplicate skipping #11