wcmc-its / ReCiter

ReCiter: an enterprise open source author disambiguation system for academic institutions
Apache License 2.0
45 stars 23 forks source link

Manage case where lastName is one character and firstName is multiple characters #529

Open paulalbert1 opened 4 months ago

paulalbert1 commented 4 months ago

This occurs at least ~0.03% of the time, probably more as unlikely records are discarded.

See query:


SELECT * from (
SELECT 
  a.pmid, 
  a1.personIdentifier,
  userAssertion,
  GROUP_CONCAT(distinct authorLastName SEPARATOR ', ') AS lastName, 
  GROUP_CONCAT(distinct authorFirstName SEPARATOR ', ') AS firstName
FROM 
  person_article a1
JOIN
  person_article_author a on a1.pmid = a.pmid and a1.personIdentifier = a.personIdentifier
WHERE 
  CHAR_LENGTH(authorLastName) = 1 AND 
  CHAR_LENGTH(authorFirstName) > 1
GROUP BY  
  a.pmid
) x where lastName like '%,%'

990d34ae-eb42-4ef8-be5b-5f8851dbe932