majlis-erc / majlis-data

1 stars 0 forks source link

Reconcile person strict list #73

Closed mMoliere closed 9 months ago

mMoliere commented 1 year ago

TSV in OpenRefine:

  1. Use reconcile feature to pull IDs for other services.
  2. Pull alternate names and create fields for corresponding source attributes.
  3. Eliminate doubles

Questions:

  1. How to deal with source attributes of doubles?

Data quality

  1. VIAF does not encode language for variants.
  2. ISNI takes much of its alternate names from VIAF, so it does not encode languages either. On the plus side, it stores links to Wikipedia and Wikidata. The MARC records are more extensive than the derivative JSON and RDF files.
  3. Wikidata does encode language for variants. But not for all persons.
  4. Different versions of Wikipedia are distinguishable by language. Name plus language encoding could be pulled from the URL provided by Wikidata and others. Other names will be given without encoding.