lwhitler / arxiv-mailer

MIT License
0 stars 0 forks source link

Not Finding People - Marcia, Leandro #3

Open vikrammanikantan opened 2 months ago

vikrammanikantan commented 2 months ago

Mailer not finding or not matching the following people:

lwhitler commented 2 months ago

Two things seem to be at work here, both related to how the mailer parses middle initials. Or more generally, names with multiple spaces in them.

My overall feeling on how to fix: homogenize the way that names are split in the directory and in papers rather than adding more cases to check. The sensible human thing is probably to split after the first space, though need to be careful then about last names (if "J Rieke" becomes Marcia's entire last name in the directory, it needs to work when she publishes as Marcia Rieke and her surname it gets parsed as "Rieke"). So maybe split after last space? Or somehow separate out the middle parts of the name entirely?

Unsure yet how to achieve this, though parsing the tex file uses regex and the directory does something else; use regex for both?

vikrammanikantan commented 2 months ago

Two comments:

  1. We can just split by all spaces, and then take the first element and the last element to be the first and last name, respectively.
  2. I am a little confused. It sounds like the behavior is different in both cases. For Marcia, everything but the last name is considered the first name. But for Leandro, everything by the first name is considered the last name? In other words, their names being split at different locations, which does not seem good.
lwhitler commented 2 months ago
  1. I don't think that will work for people who go by MiddleName LastName in the directory, but publish as e.g. FirstInitial MiddleName LastName; FirstInitial will get compared to MiddleName and fail.

  2. The directory is being broken on a comma that separates LastName, FirstName (or LastName, FirstName MiddleInitial), so it's actually correctly sorting out which part of the name is which. I need to stare at the regex some more, but I think it might be assuming that just the word after the last space is the last name.

I think this wasn't working even with the old directory so I don't think reverting back will help, incidentally, I found some old emails with Marcia unidentified if her name was on the paper as Marcia Rieke instead of Marcia J. Rieke (and same thing for George Rieke vs. George H. Rieke).