pulibrary / bibdata

Local API for retrieving bibliographic and other useful data from Alma (Ruby 3.2.0, Rails 7.1.3.4)
BSD 2-Clause "Simplified" License
16 stars 7 forks source link

Implement RECAP/SCSB Submit collection - Ensure non-recap holdings and items are excluded from scsb record dump #1373

Closed hackartisan closed 1 year ago

hackartisan commented 3 years ago

The dump we’re creating for Scsb contains holdings which are not actually recap holdings. We don't want these loaded into the scsb system.

We need to filter out holdings (852s) and items (876es) that aren’t recap holdings / items.

Implementation:

We need an example record we can use for this work.

@mzelesky please review the logic outlined here. If you can provide the 876 subfield that gives the holding id that would also be helpful.

christinach commented 3 years ago

I also updated the description ^. We are looking for 852 | 8 to get the holding id and then to get the items attached to this holding we are looking for 876 | 0.

mzelesky commented 3 years ago

Here is an algorithm for removing the unwanted fields:

  1. Identify all 852 fields with subfield 8 that have non-ReCAP locations (not recap_rmt).
  2. Remove all 852, 866, 867, and 868 fields with subfield 8 values that correspond to 852|8 values from step 1.
  3. Remove all 876 fields with the values from step 1 in the subfield 0.
mzelesky commented 3 years ago

Here is an MMS ID for a record with ReCAP and non-ReCAP items. 993704353506421

tpendragon commented 3 years ago

@mzelesky Can you send a link to the submit collection code you wrote here?

christinach commented 2 years ago

@kevinreiss said to check back on this in February 2022.

christinach commented 2 years ago

see https://app.zenhub.com/workspaces/orangelightbibdatarequests-571691cab409d8d821b873be/issues/pulibrary/bibdata/1421

christinach commented 2 years ago

@mzelesky and I reviewed the workflow in his script and updated the comments in the script. For reference:

### Steps to exporting boundwiths for SubmitCollection
### 1. Create a set in Alma of all records in RECAP locations
### 2. Run a publishing job with physical enrichment on those records - (there are no portfolios)
### 4. For each record:
### https://github.com/pulibrary/bibdata/blob/91b94588471483b584129c6c426b4ce79e22f286/marc_to_solr/lib/process_holdings_helpers.rb#L16
### https://github.com/pulibrary/bibdata/blob/91b94588471483b584129c6c426b4ce79e22f286/marc_to_solr/lib/process_holdings_helpers.rb#L28
### https://github.com/pulibrary/bibdata/blob/91b94588471483b584129c6c426b4ce79e22f286/marc_to_solr/lib/process_holdings_helpers.rb#L33
###   a. Remove any 852, 866, 867, and 868 fields without $8
###   b. Remove any 876 fields without $0 and $a
###   c. Append the 852$c value to the 852$b joining with a dollar sign
###   d. Remove the 852$c
###   e. Append the 852$i value to the 852$h joining with a space (the h is the subject part of the call numer. i is the items specific part)
###   f. Remove the 852$i
###   g. Change $8 subfields in 852, 866, 867, 868 to $0
###   h. If there is no 876$3 make sure you add one (even empty) . this is the description subfield.
###   i. Add the use restriction to 876$h, always include it. There has to be an 876$h.
###   j. Add the CGD to 876$x 
###   k. Replace the 876$z if there is one for the customer code (PA, etc.). In most cases I upcase it but there are certain locations where we replace it with a different value.
###   l. Add 876$l with the value of RECAP
###   m. Remove 876$x (it used to be proccess type)
###   n. Move 876$y to 876$k (owning library) and then delete the 876$y
#### The following two steps are Only for Host records.
#### 5. Retrieve the constituent records (by finding ids from 774$w fields )if it is a host record. Do an api call.
#### 6. For each 876 in the host record: 
####   a. process all of the constituents attached to the host:
####     i. Remove all existing 852, 866, 867, 868, and 876 values from the
####       constituent records. (A constituent record does not have any holdings. )
####     ii. Copy the 876 field and the associated (the 876$0 links to the $0 related holding fields:) 852, 866, 867, 868.
####       constituent
####   b. Output the host record as well with the holding and item fields filtered to individual 876 being evaluated. Example. Host has 3 876s (items), this should generate three bibs one for each host with one 876 each.
christinach commented 1 year ago

With the latest update and per @mzelesky we want to submit a separate record for each item.

christinach commented 1 year ago

@christinach @mzelesky make sure we dont send to SCSB PUL items with a d in the leader.

christinach commented 1 year ago

closing in favor of https://github.com/pulibrary/lib_jobs/issues/495