sul-dlss / FOLIO-Project-Stanford

Task management for Stanford’s analysis of FOLIO.
2 stars 0 forks source link

itemcat to stat codes for Instances with no items #416

Closed ahafele closed 1 year ago

ahafele commented 1 year ago

We discovered that E-THESIS is not getting assigned as a stat code on migration because these records don't have items.

We'd also like to assign Database as a stat code but these records don't have items either.

@dlrueda want to add to this ticket and then we can take DRAFT off?

dlrueda commented 1 year ago

I think we need a way to add an instance-level stat code to a file of ckeys/HRIDs? (Maybe two columns, ckey and E-THESIS/DATABASE in 2nd column) This can be generated as part of export (we can grep for E-THESIS and DATABASE from the item data and produce a file of catkeys. We could then create one for every ckey range if that’s what the dag wants, or one big file to do as a post-process step).

@jermnelson what would be best for processing?

dlrueda commented 1 year ago

Jeremy and Darsi discussed. A file (per range) named ckeys_09300000_09349999.intstatcode.tsv with ckey in 1st column, and E-THESIS, LEVEL3-CAT, LEVEL3OCLC, MARCIVE, DATABASE in 2nd column.

He can create instance level stat code from that.

dlrueda commented 1 year ago

But first, will ask Access Team if an instance level stat code is something they can use

dlrueda commented 1 year ago

Seems like yes, Access Team can see (and therefore use) stat code at instance level

ahafele commented 1 year ago

Related to #359

@jermnelson this is ready now.

shelleydoljack commented 1 year ago

The boundwith tsv files have the item stat codes so a question is, should these also be included in the irec.tsv files or can they be mapped to the holdings from the boundwith tsv too?

dlrueda commented 1 year ago

Also MARCIVE should be instance level stat code for INTERNET items

dlrueda commented 1 year ago

In items_all_w_internet file

Look for E-THESIS, LEVEL3-CAT, LEVEL3OCLC, MARCIVE in item cat1 Look for DATABASE in item type

then split into the 50K ckey ranges named range.intstatcode.tsv

dlrueda commented 1 year ago

@jermnelson most records have either DATABASE (in the item type field) OR one of the other values (E-THESIS, LEVEL3-CAT, LEVEL3OCLC, MARCIVE) in the item cat1 field.

But 10 total records have both itype DATABASE and itemcat1 MARCIVE.

Do you want to handle both code for these 10 records? (We could just bring over DATABASE and have staff add the MARCIVE stat code manually to those 10 if it’s too hard to code quickly to handle both.)

If you want to handle those 10, how do you want the file? Do you want two lines for the ckeys that have both DATABASE and MARCIVE: 10361029|DATABASE| 10361029|MARCIVE|

or do you want the file to be in this form ckey|itemtype|itemcat1| 10361029|DATABASE|MARCIVE| 14177498|DATABASE|| 10734787||LEVEL3OCLC| (so, usually a blank field in either 2nd or 3rd field, but one line per ckey)

jermnelson commented 1 year ago

@dlrueda I think the last format of ckey|itemtype|itemcat1| with one line per ckey would be the easiest to handle and assign. Thanks!

ahafele commented 1 year ago

@dlrueda does this need an additional export related ticket?

dlrueda commented 1 year ago

I feel like this is the export related ticket, if by export you mean add it to the Symphony extract for migration?

ahafele commented 1 year ago

Database Instance Statistical code has been created on -test

shelleydoljack commented 1 year ago

Created script /s/SUL/Bin/folio_symphony_extract/Bibs/item_to_inst_statcode.ksh. Updated /s/SUL/Bin/folio_symphony_extract/Bibs/generate_marc_items_tsv.ksh to call an updated items_to_50K_ranges.pl that uses the outputfile of item_to_inst_statcode.ksh. @dlrueda the lines I added to generate_marc_items_tsv.ksh should be added to the new_generate_marc_items_tsv.ksh. I can do that if you want, just wanted to inform you first about it.

Still need to address the itemxinfo notes for when we don't create folio items.

shelleydoljack commented 1 year ago

Oh wait, I still need to call the item_to_inst_statcode.ksh script from the generate marc one. I thought the script would have to have more logic, but it really doesn't, so I think I'm going to incorporate it into the generate marc script instead.

dlrueda commented 1 year ago

@jermnelson final form of file names is ckeys_$ckeyrange.instatcode.tsv

shelleydoljack commented 1 year ago

The instatcode.tsv files are created in the generate marc and items script. This is completed.