ycba-cia / blacklight-collections2

5 stars 2 forks source link

Creators names harmonization #100

Open edgartdata opened 5 years ago

edgartdata commented 5 years ago

@flapka Could we think of a way to harmonize creators' names between the library data and the art data so that there are no duplicate names in the Creator facet? @yulgit1 and I were thinking that referencing published authorities, such as LOC, VIAF, ULAN, ODNB might be the way to go. We currently only have 47 names in TMS with LOC ID, 150 with VIAF ID. We have many more for ULAN, ODNB, and Wikidata.

edgartdata commented 5 years ago

@edgartdata to set up a meeting with Michael for this with Eric and Francis.

edgartdata commented 5 years ago

Meeting scheduled for Monday May 13 at 3:15 at Crown.

edgartdata commented 5 years ago

The current ULAN-LC hard coded matching mechanism does not return matches when the LC authority records do not include life dates. These need human expertise to create those matches. @edgartdata to add the 2,200 LC-ULAN matches to TMS to be published in the Creators facet, without attribution qualifiers (give Eric the list of attribution qualifiers to filter on). @edgartdata and @flapka to compare the remaining 1,200 local heroes to prioritize for reconciliation over the summer. LC will be starting links to Wikidata in a few weeks (rumor?).

edgartdata commented 5 years ago

@edgartdata to share analysis of 200 LC-ULAN matches done recently.

edgartdata commented 5 years ago

Update: LC now publishes Wikidata URIs when Wikidata includes LC links. How did Wikidata match with LC? What were the criterias?

edgartdata commented 5 years ago

Update: @edgartdata provided @flapka and @yulgit1 with a document checking the accuracy of 200 LC-ULAN matches recently done programmatically.

edgartdata commented 5 years ago

@flapka @yulgit1 Here is the list of artists who do not have a machine actionable link in TMS to any authorities (including ULAN, LC, VIAF, ODNB...), i.e. these are our TMS local heroes. Some of them have a ULAN link noted in the TMS Biography field, but because this field has much legacy data we are not publishing its content. Instead we use another TMS field for the purpose of passing the ULAN and LC links to LIDO XML. YCBA local heroes.xlsx

I could refine this and see how many works we have by each artist. That may be a useful way to prioritize which one to tackle first?

edgartdata commented 5 years ago

@flapka Ok, so let me try and summarize what we have agreed on so far for entity reconciliation/name harmonization for the Creator facet:

Are the LC names already captured in our MARC records? For example, http://10.5.96.187:3000/catalog/orbis:3585313 Views of Essex has Turner noted this way: Turner, J. M. W. (Joseph Mallord William), 1775-1851. Is that the LC format? (LC is down rn!)

sequence:

flapka commented 5 years ago

Update: @edgartdata provided @flapka and @yulgit1 with a document checking the accuracy of 200 LC-ULAN matches recently done programmatically.

Thanks Emmanuelle. Can you confirm: Is this the Excel document titled "ulan_lc_mapping" -- shared on August 9.

I've spent about 10 minutes with it, looking especially at the pairs treated as non-matches. For the 10 or so random pairs that I've spot-checked, I'd have reached different conclusions: they all look like matches (99% confidence), with expected variations (one source might say "British" the other "Scottish").

flapka commented 5 years ago

For local heroes:

I could refine this and see how many works we have by each artist. That may be a useful way to prioritize which one to tackle first?

I think sorting by no. works is a good idea.

flapka commented 5 years ago

@edgartdata To your last set of questions:

edgartdata commented 4 years ago

@flapka Just a note to discuss progress on this in P&D and RB.

edgartdata commented 3 years ago

Update: The CHIT Bias Awareness and Responsibility Committee has decided to craft a questionnaire for living artists represented in the collections of the LUX initiative. The data reconciliation will maintain a database for the resulting data. In the future there may or may not be a feature that allows artists to update their descriptions themselves whenever and however they want. The resulting information will be transformed by the data reconciliation process into structured data and made available to CHIT partners.

(this is also mentioned in https://github.com/ycba-cia/TMS/issues/25)