sul-dlss / FOLIO-Project-Stanford

Task management for Stanford’s analysis of FOLIO.
2 stars 0 forks source link

call number getting lopped off when extracting record #468

Closed shelleydoljack closed 1 year ago

shelleydoljack commented 1 year ago

Something really weird is going on with the call numbers and enumeration for the item records on ckey 4084116. Using the selection we use for the generate marc and item script, we get:

sirsi@bodoni /s/SUL/Bin/folio_symphony_extract/Bibs> echo 4084116 | trs selitem -iC  -m"~WITHDRAWN" -oNIBylmteg26 2> /dev/null | trs selcallnum -iN -oCSpBZ2 2> /dev/null | trs selcatalog -iC -oSf6 2> /dev/null
4084116|1|1|4084116-1001    |SPEC-COLL|MANUSCRIPT|MANUSCRIPT|MANUSCRIPT||TAX=8.25|0|1|ASIS|M0699||0|MANUSCRPT|0|
4084116|2|1|36105116508214  |SPEC-COLL|MSS-30|MSS-30|NONCIRC|||0|1|ALPHANUM|M069| BOX 1|0|MANUSCRPT|0|
4084116|3|1|36105116508222  |SPEC-COLL|MSS-30|MSS-30|NONCIRC|||0|1|ALPHANUM|M069| BOX 2|0|MANUSCRPT|0|
4084116|4|1|36105116508230  |SPEC-COLL|MSS-30|MSS-30|NONCIRC|||0|1|ALPHANUM|M069| BOX 3|0|MANUSCRPT|0|

instead of M0699 BOX 1 we get M069 BOX 1. Maybe something not right with how the data is entered in the call number field? Screen Shot 2023-06-26 at 12 57 28 PM

shelleydoljack commented 1 year ago

The callnum table data for this record is:

"CATALOG_KEY","CALL_SEQUENCE","LIBRARY","ITEM_NUMBER","SHELVING_KEY","CLASS","NUMBER_OF_COPIES","NUMBER_OF_CALL_HOLDS","NUMBER_ON_RESERVE","ANALYTIC_POSITION","NUMBER_OF_VISIBLE_COPIES","LAST_COPY_BOOKED","SHADOW","BOUND_WITH","NUMBER_OF_RESERVE_CONTROLS","SYSDATE_MODIFIED"
4084116,1,21,"M0699","M0699",6,1,0,0,0,1,0,0,0,0,19-MAR-2016 22:52:39
4084116,2,21,"M0699 BOX 1","M000699 BOX 000001",9,1,0,0,5,1,0,0,0,0,14-APR-2023 08:42:56
4084116,3,21,"M0699 BOX 2","M000699 BOX 000002",9,1,0,0,5,1,0,0,0,0,14-APR-2023 08:43:05
4084116,4,21,"M0699 BOX 3","M000699 BOX 000003",9,1,0,0,5,1,0,0,0,0,14-APR-2023 08:43:12

The value of ANALYTIC_POSITION is 5. Count 5 from 1 in M0699 and you end up at 9, not after 9. This is likely why selcallnum -oB think basecallnum is "M069" and not "M0699".

Looking at the callnum table data for another record, where there is no lopping happening in folio, the ANALYTIC_POSITION is 7:

"CATALOG_KEY","CALL_SEQUENCE","LIBRARY","ITEM_NUMBER","SHELVING_KEY","CLASS","NUMBER_OF_COPIES","NUMBER_OF_CALL_HOLDS","NUMBER_ON_RESERVE","ANALYTIC_POSITION","NUMBER_OF_VISIBLE_COPIES","LAST_COPY_BOOKED","SHADOW","BOUND_WITH","NUMBER_OF_RESERVE_CONTROLS","SYSDATE_MODIFIED"
4085056,530,21,"SC0112 ACCN 1992-129 BOX 1","SC 000112 ACCN 001992-000129 BOX 000001",8,1,0,0,7,1,0,0,0,0,10-APR-2023 11:49:42

For this one, count to 7 from 1, and you end up after 2 in SC0112 (at the space).

shelleydoljack commented 1 year ago

@cbeer it might be helpful to know how many are like this so we can see if Data Control can fix them before we migrate. Otherwise, i'm not sure how we'd find them.

shelleydoljack commented 1 year ago

I had a thought, that we could add yet another column to the item tsv that is the call number (-oD flag), in addition to the columns where selcallnum splits out base callnum and analytic. Maybe it could be used to see if base callnum is missing the last character 🤷

shelleydoljack commented 1 year ago
echo 4084116 | trs selitem -iC  -m"~WITHDRAWN" -oNIBylmteg26 2> /dev/null | trs selcallnum -iN -oCSpBZD2 2> /dev/null | trs selcatalog -iC -oSf6 2> /dev/null
4084116|1|1|4084116-1001    |SPEC-COLL|MANUSCRIPT|MANUSCRIPT|MANUSCRIPT||TAX=8.25|0|1|ASIS|M0699||M0699|0|MANUSCRPT|0|
4084116|2|1|36105116508214  |SPEC-COLL|MSS-30|MSS-30|NONCIRC|||0|1|ALPHANUM|M069| BOX 1|M0699 BOX 1|0|MANUSCRPT|0|
4084116|3|1|36105116508222  |SPEC-COLL|MSS-30|MSS-30|NONCIRC|||0|1|ALPHANUM|M069| BOX 2|M0699 BOX 2|0|MANUSCRPT|0|
4084116|4|1|36105116508230  |SPEC-COLL|MSS-30|MSS-30|NONCIRC|||0|1|ALPHANUM|M069| BOX 3|M0699 BOX 3|0|MANUSCRPT|0|
cbeer commented 1 year ago

I've seen that one and maybe a4337581 in my random sampling, but your item tsv dump seems like a better way to find out.

shelleydoljack commented 1 year ago

Darsi edited the call num records for 4084116 by removing the spaces after the |Z and selcallnum now extracts the data correctly:

echo 4084116 | trs selitem -iC  -m"~WITHDRAWN" -oNIBylmteg26 2> /dev/null | trs selcallnum -iN -oCSpBZD2 2> /dev/null | trs selcatalog -iC -oSf6 2> /dev/null
4084116|1|1|4084116-1001    |SPEC-COLL|MANUSCRIPT|MANUSCRIPT|MANUSCRIPT||TAX=8.25|0|1|ASIS|M0699||M0699|0|MANUSCRPT|0|
4084116|2|1|36105116508214  |SPEC-COLL|MSS-30|MSS-30|NONCIRC|||0|1|ALPHANUM|M0699|BOX 1|M0699 BOX 1|0|MANUSCRPT|0|
4084116|3|1|36105116508222  |SPEC-COLL|MSS-30|MSS-30|NONCIRC|||0|1|ALPHANUM|M0699|BOX 2|M0699 BOX 2|0|MANUSCRPT|0|
4084116|4|1|36105116508230  |SPEC-COLL|MSS-30|MSS-30|NONCIRC|||0|1|ALPHANUM|M0699|BOX 3|M0699 BOX 3|0|MANUSCRPT|0|

Need script to see how many are like this and try to fix before migration. https://github.com/sul-dlss/FOLIO-Project-Stanford/issues/470