Open kilahimm opened 1 year ago
My thoughts on your questions:
1-to-matches-with-call-number
all seem good. The few issues that I found aren't really issues that will impact our project. For example, in RG002, Office of the President records, I found that there are multiple TLC containers that represent the same physical box. We've (correctly) matched 1 barcode to multiple TLC records. Pushing the barcode to all of the TLC records would be fine since they all represent the same physical box with the same barcode. However, we should consult with Jen -- this may be something that we want to cleanup instead of just ignoring. Diving deeper into the duplicates
, here are my findings so far:
Bad Aspace Data - Incorrect linkage between AOs and TLCs Examples of this are most evident in the duplicates for the David A Clarke records. See this TLC record in the Aspace PUI. Notice how folder #s are repeated. AOs in series 2 were incorrectly linked to TLCs from series 1. This messes with the TLC component field that we are using to match w/ Alma enum/callnumbers. There's probably some logic we could come up with to over-come this issue, but we should probably address the core issue instead.
Unprocessed boxes
The Greater Washington Board of Trade records are a good example of this. This series, Unprocessed 1997 Accretion is represented by this holdings in Alma, 22638551460004107
. The unprocessed boxes represented in Alma do not have corresponding TLCs in Aspace. Many (all?) of the duplicates for this resource ID are caused by boxes from the described portions of the collection matching with boxes from that unprocessed accretion because the box numbers are not unique. A simple fix would be to exclude the items on the 1997 UP holdings.
Implied Series # Sometimes the enum or call number in Alma is just 'Box X' with no series information. In some cases, we can assume that these are series 1. For example, this duplication is caused by that scenario:
32882013264968 | Box 1 | 22638529140004107 | B | 1 | RG0085 Series 1 | 27668 | 636 | Faculty Women's Club records | ||
---|---|---|---|---|---|---|---|---|---|---|
32882017682108 | Series 3 Box 1 | 22638529140004107 | B | 1 | RG0085 Series 1 | 27668 | 636 | Faculty Women's Club records | ||
32882018610033 | Box 1 Series 2 | 22638529140004107 | B | 1 | RG0085 Series 1 | 27668 | 636 | Faculty Women's Club records |
This seems very risky though. We may want to manually confirm when this scenario happens.
See results of preliminary matching on this spreadsheet.
1-to-1-matches-with-call-number
shows top containers that matched a single Alma item/barcode after testing against both the item enumeration and holdings-level call number.duplicates
shows top containers that matched on more than one item/barcode. No priority was given, so if a top container matched one item on the enumeration and the call number, and a second item on just the enumeration, both matches are shown in this sheet.Sheet4
shows the collections and the number of top containers with duplicate matches.Questions
1-to-matches-with-call-number
seem correctly matched? It may be useful to evaluate a random sampling here. 10% would be approximately 800 rows; if divided among 8 people, each person would need to check 100 rows.