This PR rewrites the bulk operation used to link metadata records to Node instances to avoid hitting the IN clause max on Django's supported database backends.
We're using SQLite for ATLAS, and the default can be as low as 999, but higher on most OSes being used in dev or production. macOS has the limit set to 500,000, and Alpine 250,000.
I profile various combinations for the slice_large_list function; using 2000 for the batch size saw acceptable performance with > 250,000 records.
The limit we hit was with SQLITE_MAX_VARIABLE_NUMBER on Ubuntu ~250,000, so this fix allows us to avoid hitting the limit.
This PR rewrites the bulk operation used to link metadata records to Node instances to avoid hitting the
IN clause
max on Django's supported database backends.We're using SQLite for ATLAS, and the default can be as low as 999, but higher on most OSes being used in dev or production. macOS has the limit set to 500,000, and Alpine 250,000.
I profile various combinations for the
slice_large_list
function; using 2000 for the batch size saw acceptable performance with > 250,000 records.The limit we hit was with SQLITE_MAX_VARIABLE_NUMBER on Ubuntu ~250,000, so this fix allows us to avoid hitting the limit.
Choosing 2000 also hews closely to what
Queryset.iterator
returns by default