wcmc-its / ReCiter

ReCiter: an enterprise open source author disambiguation system for academic institutions
Apache License 2.0
46 stars 25 forks source link

Failure to score article first name in cases where institutional first name contains space or dash #476

Closed paulalbert1 closed 3 years ago

paulalbert1 commented 3 years ago

Problem

In cases where institutionalAuthorNameFirstName contains a - or `, ReCiter fails to score the first name. You can tell because the scorenameMatchFirstTypeis missing from the Feature Generator output. Presumably, by default, nameMatchFirstScore is set to 0. (In such circumstances, the reporting database misleadingly saysnameMatchFirstType = full-exact`, which is not an accurate representation of the data in Feature Generator.)

Scope: this affects 800+ records for WCM full-time faculty (n=2,000).

Desired behavior

This should be an easy-ish fix. For this type of name, I recommend replicating this block of code.

Examples

"personIdentifier": "bgharvey",
      "pmid": 32936026,
        "authorNameEvidence": {
          "institutionalAuthorName": {
            "firstName": "Ben-gary",
            "firstInitial": "B",
            "lastName": "Harvey"
          },
          "articleAuthorName": {
            "firstName": "Bartholomew S J",
            "firstInitial": "B",
            "lastName": "Harvey"
          },
          "nameScoreTotal": 0.664,
          "nameMatchFirstScore": 0,
          "nameMatchMiddleScore": 0,
          "nameMatchLastType": "full-exact",
          "nameMatchLastScore": 0.664,
          "nameMatchModifierScore": 0
        },

  "personIdentifier": "amcbride",
      "pmid": 34283775,
      "evidence": {
        "authorNameEvidence": {
          "institutionalAuthorName": {
            "firstName": "P. anne",
            "firstInitial": "P",
            "lastName": "Mcbride"
          },
          "articleAuthorName": {
            "firstName": "Allison",
            "firstInitial": "A",
            "lastName": "McBride"
          },
          "nameScoreTotal": 0.664,
          "nameMatchFirstScore": 0,
          "nameMatchMiddleScore": 0,
          "nameMatchLastType": "full-exact",
          "nameMatchLastScore": 0.664,
          "nameMatchModifierScore": 0
        },   

  "personIdentifier": "amc2056",
      "pmid": 34145622,
        "authorNameEvidence": {
          "institutionalAuthorName": {
            "firstName": "Augustine m. k.",
            "firstInitial": "A",
            "lastName": "Choi"
          },
          "articleAuthorName": {
            "firstName": "Anthony J",
            "firstInitial": "A",
            "lastName": "Choi"
          },
          "nameScoreTotal": 0.664,
          "nameMatchFirstScore": 0,
          "nameMatchMiddleScore": 0,
          "nameMatchLastType": "full-exact",
          "nameMatchLastScore": 0.664,
          "nameMatchModifierScore": 0
        },