Closed will-moore closed 5 years ago
Trying to improve regex to get surname only from all Publication Authors...
Mix of names (produced by splitting the Publication Authors
value by ,
or and
or &
). E.g:
I'm not even sure what the surname is for the last 2.
If anyone can figure out a rule that gives the correct surname from each of these cases, please let me know. cc @sbesson @francesw.
Currently I'm splitting on ' '
, filtering for words that contain a lowercase. Then if there are 1 words, that's the surname, if there are 2 or more words, I ignore the first word (first name) and join the rest.
e.g. Francesco Paolo Casale
-> Paolo Casale
But also Petri Seiler K
-> Seiler
and M. Julius Hossain
-> Hossain
which I think are probably wrong??
If you've figured out a rule we could use it to automatically extract the surnames and enter them as individual key-value pairs. Or maybe have each author as a separate map-ann in an /author
namespace
Re publication authors, my inclination would be to work towards unifying the formatting of the Publication Authors
key in the study files and their representation in IDR. @manics has proposed a few ways to store the data as map annotations. For the study files, I can certainly conceive using the Pubmed style everywhere especially since we use PubMed ID as the primary publication identifier i.e. Last Name 1 <Initials 1>, Last Name 2 <Initials 2>
. Probably something worth discussing on @francesw return.
tl;dr - happy for this to be merged if there aren't any easy fixes that @will-moore wants to get in.
Nothing else to add right now, thanks.
Various fixes from feedback on the IDR gallery.
To test: