Open vikrammanikantan opened 2 months ago
Two things seem to be at work here, both related to how the mailer parses middle initials. Or more generally, names with multiple spaces in them.
approximate_name_lookup
), it considers the case when the directory name is a subset of the author list name but not vice versa. i.e. a person can use an initial in the author list when it isn't in the directory but not the other way around.My overall feeling on how to fix: homogenize the way that names are split in the directory and in papers rather than adding more cases to check. The sensible human thing is probably to split after the first space, though need to be careful then about last names (if "J Rieke" becomes Marcia's entire last name in the directory, it needs to work when she publishes as Marcia Rieke and her surname it gets parsed as "Rieke"). So maybe split after last space? Or somehow separate out the middle parts of the name entirely?
Unsure yet how to achieve this, though parsing the tex file uses regex and the directory does something else; use regex for both?
Two comments:
I don't think that will work for people who go by MiddleName LastName in the directory, but publish as e.g. FirstInitial MiddleName LastName; FirstInitial will get compared to MiddleName and fail.
The directory is being broken on a comma that separates LastName, FirstName (or LastName, FirstName MiddleInitial), so it's actually correctly sorting out which part of the name is which. I need to stare at the regex some more, but I think it might be assuming that just the word after the last space is the last name.
I think this wasn't working even with the old directory so I don't think reverting back will help, incidentally, I found some old emails with Marcia unidentified if her name was on the paper as Marcia Rieke instead of Marcia J. Rieke (and same thing for George Rieke vs. George H. Rieke).
Mailer not finding or not matching the following people: