With #12 we introduced set based pk duplicate detection in python, which is not how it should be done. We simply cannot do that solely in python, as the db might apply different rules for identity checks.
On the other hand we also cannot simply ignore duplicates and let the db "somehow" deal with them, as it might surface db differences:
postgres / sqlite documented as undefined behavior (always first applied in tests, later values ignored)
mysql - nothing in docs about it (applied all value changes in logical order in tests)
oracle shims (yet to come) - raises duplicate error
keep set / hash reduction for pk types, where applicable
Using python internals is by far the fastest, so it might be a good idea to keep it for primitive field types, that are known to be handled the same way by any db engine - should be true for int and string types, maybe also date types (caution with sqlite here)
explicit db roundtrip with a pk__in reduction
Should be possible as fallback for all types, but creates an additional db query (bad perf). This is prolly the only way for complex types (e.g. json, hstore, own defined types). Most projects will never use complex types as pk type (discouraged), as they are not even stable across db engines (e.g. json will be handled very different in identity in postgres/jsonb vs sqlite/string-repr).
With #12 we introduced set based pk duplicate detection in python, which is not how it should be done. We simply cannot do that solely in python, as the db might apply different rules for identity checks.
On the other hand we also cannot simply ignore duplicates and let the db "somehow" deal with them, as it might surface db differences: