legumeinfo / mine-issues

Report ALL issues on LIS mines here! Regardless of which mine you found it on!
2 stars 0 forks source link

Merge authors on (firstname, lastname, initials)? #130

Closed sammyjava closed 8 months ago

sammyjava commented 10 months ago

I think I intentionally do NOT merge authors during publication loads because I was concerned about conflating John Smith of Oregon State with John Smith of Iowa State. BUT I think that rare case isn't important enough to not merge, as by merging we can get at least a partial publication list on an Author page (not complete since some publications say J. Smith, which could also be Jane Smith of Ohio State and Jeremy Smith of Florida State).

Maybe this is a Bad Idea. But I noticed that the same author appears in many different author records, which looks bad. Perhaps the funkiness of author names on publications means we have to live with that. (Ideally we'd use a unique identifier like ORCID but that's not gonna ever be available for the 4,307 author records in GlycineMine, for example.)

sammyjava commented 10 months ago

Clearly this is at the Food For Thought level. Having multiple author records doesn't impact queries. Having only last names complete in many author records DOES mean that we ought to stick with lastname in searches: R Nelson (6 records) and Rex Nelson (28 records) are presumably the same person.

sammyjava commented 8 months ago

OK I've implemented merging on Author.name, where name = firstName + " " + lastName if initials is null, else firstName + " " + initials + " " + lastName.

And actually it turns out that R Nelson is Randall L Nelson, presumably R L Nelson as well as Randall Nelson. Rex is consistently Rex T Nelson. So all 28 of Rex's author records will merge into a single "Rex T Nelson" record with 28 publications, which is a Good Thing. People with the same firstName, lastName without initials will be joined at the hip. Tough.