sul-dlss / searchworks_traject_indexer

indexing MARC, MODS, and more for SearchWorks
Other
6 stars 1 forks source link

Poppy upgrade: Investigate using `instance.complete_updated_date` to replace instance/holdings/items join #1332

Closed cbeer closed 7 months ago

cbeer commented 7 months ago

Possibly fixes https://github.com/sul-dlss/searchworks_traject_indexer/issues/980

hudajkhan commented 7 months ago

Seems like: Poppy added field for timestamp on instance reflecting whether holdings/items have changed. We need to assess if that is in fact true, and then see if we can use the Poppy info to save some indexing work.

May also require assessing impact on indexing memory issues.

dnoneill commented 7 months ago

See comment https://github.com/sul-dlss/searchworks_traject_indexer/pull/1310

dnoneill commented 7 months ago

Tests completed

Based on the caveats this does not fix #980 but it does allow us to clean up some code.

dnoneill commented 7 months ago

Couple of things that came up with exploring the data:

table Number of rows per table How to access instance
sul_mod_inventory_storage.instance 10,284,710 n/a
sul_mod_inventory_storage.holdings_record 12,366,458 instanceid field
sul_mod_inventory_storage.items 11,429,559 holdingsrecordid field

One odd thing about the holdings_record is there doesn't seem to be any empty instanceid fields (assuming I queried correctly). Some holdings have same instanceids (see WHERE instanceid = '6b68aea8-2230-52a9-b008-ce8f21309f90')