SubmitCollection to ReCAP needs to support bound-withs. - Githubissues

pulibrary / bibdata

Local API for retrieving bibliographic and other useful data from Alma (Ruby 3.2.0, Rails 7.1.3.4)

BSD 2-Clause "Simplified" License

16 stars 7 forks source link

SubmitCollection to ReCAP needs to support bound-withs. #1255

Closed tpendragon closed 3 years ago

tpendragon commented 3 years ago

Sibling of #1251

We need more information here, I'm going to fill in my understanding and ask @mzelesky edits this if I get anything wrong. This functionality is already implemented for BarcodesController.

When a host record appears in the incremental dump:

Pull the updated constituent records.
Copy all holding/item data from the host record over to the constituent records
Send the host records as well as the constituents.

When a constituent record appears in the incremental dump:

Pull the host record (there may not be a link back from constituent to host - this is something that will be added in the future, maybe we can count on it existing?)
Copy all holding/item data from the host record to constituent records
Send the host records as well as the constituents.

The architecture here needs to try to avoid doing a ton of single-bib API calls. We may be able to do something like pre-scan for all constituents/hosts we're going to need to query for and grab them in bulk API calls.

The records we have set up correctly are listed on https://github.com/pulibrary/orangelight/issues/2431

hackartisan commented 3 years ago

@mzelesky Will these dumps included deleted records? I assume they will. Do we need to enrich deleted boundwiths to add item / holding data from the host record?

hackartisan commented 3 years ago

@mzelesky If the incremental dump doesn't include one of the records for the boundwith, we'll have to pull that from the API to include it in the file. However, the SCSB records must include suppressed records. The API doesn't return suppressed records. I'm not sure there's a way to fulfill this requirement for that edge case.

hackartisan commented 3 years ago

@mzelesky we're considering whether it might work to create a nightly dump of all boundwith host and constituent records to reference as we create the scsb file. We messed around with sets a little, though, and couldn't figure out how to generate a dump like this. Do you think it is possible?

christinach commented 3 years ago

I haven't tried it but have you looked into https://knowledge.exlibrisgroup.com/Alma/Knowledge_Articles/Create_Users_Set_for_expired_patrons_based_on_Analytics? but this would be a one time set.

hackartisan commented 3 years ago

We think we will actually get suppressed bib records from the bibliographic endpoint, and suppressed holdings from the holdings/ALL/items endpoint. But we think you wouldn't get an AVA field from the bibliographic endpoint for a holdings record that is suppressed. We have not tested these cases. We aren't sure there exist suppressed holdings.

Implementation plan:

Sweep through the dump files to collect all the host and constituent records
Insert them in a database cache for marc records, and add them to a pair of in-memory arrays (hosts, constituents)
Process all the other records as usual, skipping any that are host or constituent records
Go through each record in the arrays. Fetch any records in its boundwith set that aren't also in the cache from the API. Add them to the db cache and the arrays.
- first go through each constituent and make sure all its hosts are in the cache
- then go through each host and make sure all its constituents are in the cache.
Process each record in each array as usual. If it's a constituent record, add the holdings / item enrichment from its host record/s, pulling them from the cache.
Create a new dump file for these, and make sure they get transferred to S3.

hackartisan commented 3 years ago

We tested and confirmed that the suppressed records are returned by the API. A suppressed bib record is returned by the bib endpoint, and a suppressed holding is returned by the holdings endpoint.

The edge case here is: When a non-suppressed bib has a suppressed holding attached to it, that holding does not return in the AVA area from the bib endpoint.

eliotjordan commented 3 years ago

Closed by #1303