Closed evansd closed 4 months ago
Latest commit: |
4f2f369
|
Status: | ✅ Deploy successful! |
Preview URL: | https://e31d2e69.databuilder.pages.dev |
Branch Preview URL: | https://evansd-insert-many.databuilder.pages.dev |
Given on l233 a batch_size's worth of rows are loaded into memory (AFAICT) - could 10k cause problems with some very wide tables-from-files?
Yeah, we do load a full batch into memory (SQLAlchemy refuses to work with it otherwise), but to use even 1GB of RAM we'd be talking 100KB per row, which seems very wide and likely to cause problems in other ways.
The SQLAlchemy docs seem quite vague about actual numbers here, so I guess a best guess is all we've got?
I think it's vague because it's configured per dialect. For MSSQL I think it does 999 which seems to match some limit somewhere.
very wide and likely to cause problems in other ways.
Almost certainly. I do have a vague recollection of a supremely wide table at some point, but 100KB row it almost certainly wasnt!
The
InsertMany
construct is most frequently used for codelists, which are small enough that it's not worth optimising. However it is also used for the@table_from_file
feature where the row counts are much higher – and I have seen such inserts take 2 hours 40 minutes.Using SQLAlchemy's batched inserts is about 10x faster on MSSQL, and about another order of magnitude faster again on Trino.
Closes #1934