sul-dlss / FOLIO-Project-Stanford

Task management for Stanford’s analysis of FOLIO.
2 stars 0 forks source link

update find_bwparents.pl for records where we don't know the parent barcode #467

Closed shelleydoljack closed 1 year ago

shelleydoljack commented 1 year ago

For the BW-CHILD records that do not have the BW-PARENT barcode in the 590 field, we need to create folio holdings records (not just only a folio instance record). Update the /s/SUL/Bin/folio_symphony_extract/Bibs/find_bwparents.pl script to write the item data for these in .... tsv file

jermnelson commented 1 year ago

If we included these BW-CHILD records without the BW-PARENT barcode in the existing ckeys*.tsv.bwchild.tsv file, would the BARCODE column be an empty string then? If so, we could add a conditional and not create a BW parts record for that Holdings record. Later tasks in the DAG would just POST those records along with the rest of the BW-CHILD Holdings records to FOLIO.

shelleydoljack commented 1 year ago

Yes, I could simply not write out barcode data for this group of records. Good idea!

shelleydoljack commented 1 year ago

update find_bwparents.pl. need to test.

shelleydoljack commented 1 year ago

The script was also updated to expand what qualifies as a BW-PARENT barcode to include non-36105* barcodes, such as: 9080141-3001 (item id) and 001ABB2438 (item id).

shelleydoljack commented 1 year ago

I test ran the find_bwparents.pl. The resulting file is larger than when it was run during the last extract:

22M Jun 29 15:21 bwchild_items_all_updated
20M May  4 22:45 bwchild_items_all_updated_bak

Some lines have the barcode field blank:

37686|2|1||SAL3|SEE-OTHER|SEE-OTHER|STKS-MONO|BW-CHILD||0|1|DEWEY|913.54 .M18B|V.3:PT.3|0|MARC|0|

Some lines have more than 18 columns: https://app.zenhub.com/files/260334947/af7f19d6-9a05-4536-9997-12a3d1428a81/download . They look like this:

737430|2|1|36105042252317(item id)||HOPKINS|SEE-OTHER|SEE-OTHER|STKS|BW-CHILD||0|1|LC|QH91 .L6|NO.4|0|MARC|0|
shelleydoljack commented 1 year ago

Figured out that staff error made it so the regex to remove the "item id" note from the 590 subfield b a bit difficult, so changed it to match a predictable barcode pattern instead.