Closed jerch closed 2 years ago
Oh well, the django ticket got closed as duplicate pointing to one, that got resolved as "yes, it is a mistake, but not worth to be fixed, lets just document it". Wth? Idk whats going on there, such a handwaving style is normally not for a better software outcome. Thus I asked for reconsideration, which is very unlikely to happen (the communication is weirdly one-sided anyway).
So this leads to a change of plans, how to deal with duplicates:
which gives consistent behavior regarding duplicates users can rely on, with no side effect from batch_size
anymore.
The current duplicate skipping is not stable at the batch border creating batch offsets in following data with possible follow-up skip inconsistencies:
This is caused by prebatching as done by
bulk_update
vs. aggregated batching infast_update
, where the overall updates differ in the end.NB: This makes me wonder, if the original behavior is wanted at all - the fact that a second update gets through just because it ended up in a different batch, looks like a surprising side effect, esp. as batch_size is just meant to have some control over the query load. Wouldn't it be better to treat a single
bulk_update
call as atomic from user perspective, thus either filter all duplicates from the whole changeset, or just disallow duplicates at all? --> https://code.djangoproject.com/ticket/33672