metadb-project / metadb

Metadb extends PostgreSQL with features to support analytics such as streaming data sources, data model transforms, and historical data
Apache License 2.0
8 stars 4 forks source link

Resync performance is much slower than initial Sync performance #85

Open carolegodfrey opened 1 month ago

carolegodfrey commented 1 month ago

We ran an initial sync for a FOLIO Poppy environment (metadb v1.2.8) and it took ~51 hours for the process to complete) We recently upgraded this Poppy environment to Quesnelia and ran a resync (metadb v1.3.2)
It took ~114 hours for the process to complete

Is it expected that a resync process would take significantly longer than an initial sync process?

Are there stats for how long a sync/resync process should take given data sizes and resources?

Are there any plans to improve resync performance?

Both were executed with the same resources (db and ec2 instance type) Approximate size of largest tables: mod_inventory_storage.instance ~ 10 million records mod_inventory_storage.holdings_record ~ 11 million records mod_inventory_storage.item ~ 9.8 million records mod_source_record_storage.marc_records_lb - 21 million records mod_source_record_storage.records_lb ~ 21 million records mod_source_record_manager.journal_records ~ 34 million records