scaife-viewer / backend

Packages and utilities to build Scaife Viewer backends using ATLAS / CTS resolvers
3 stars 2 forks source link

[atlas] Make bulk metadata ingestion more resilient #64

Closed jacobwegner closed 1 year ago

jacobwegner commented 1 year ago

This PR rewrites the bulk operation used to link metadata records to Node instances to avoid hitting the IN clause max on Django's supported database backends.

We're using SQLite for ATLAS, and the default can be as low as 999, but higher on most OSes being used in dev or production. macOS has the limit set to 500,000, and Alpine 250,000.

I profile various combinations for the slice_large_list function; using 2000 for the batch size saw acceptable performance with > 250,000 records.

The limit we hit was with SQLITE_MAX_VARIABLE_NUMBER on Ubuntu ~250,000, so this fix allows us to avoid hitting the limit.

Choosing 2000 also hews closely to what Queryset.iterator returns by default