Separating author information

matildabrown / rWCVP

Generating Summaries, Reports and Plots from the World Checklist of Vascular Plants

GNU General Public License v3.0

19 stars 0 forks source link

My method so far has been to split the name and author using the spaces and e.g. str_split. It only really works (for species-level matching) when the first two words are 'Genus' and 'species' (the rest doesn't matter because the name can be matched without the author string). However, it gets really messy really quickly once infraspecifics and hybrids are involved - variable number of words before author string, the first two are not always genus and specific epithet, and the author strings can even be embedded into the name portion (e.g. Genus species Auth1 subsp. subspecies Auth2). There are algorithmic workarounds if the dataset is consistent, but we don't have a general solution I'm afraid.

matildabrown / rWCVP

Separating author information #47