ropensci / helminthR

Accesses parasite occurrence records from the London Natural History Museum's Host-Parasite database, which contains over a quarter of a million helminth records.
https://docs.ropensci.org/helminthR
GNU General Public License v3.0
7 stars 5 forks source link

latin names don't parse right with revisions #20

Closed cjcarlson closed 6 years ago

cjcarlson commented 6 years ago

i've decided to actually start letting you know about these the right way so here goes!

Revisions included in Latin names lead to mis-parsed short names for example

"Cladorchis [Fischr.] watsoni (Conyngham, 1904)" parses as "Cladorchis [Fischr.]"

"Acanthogyrus (Acanthosentis) tilapiae (Baylis, 1948)" parases as "Acanthogyrus Acanthosentis"

I've been working on a function to address this:

revis <- function(name) {

list <- strsplit(name,' ')[[1]]

if(substr(list[2],1,1)=='(') { return(paste(list[1],list[3])) } else { if(substr(list[2],1,1)=='['){ return(paste(list[1],list[3])) } else { return(paste(list[1],list[2])) } } }

taddallas commented 6 years ago

Thanks for spotting this. I've edited the findParasite function to catch bracket symbols and hopefully parse the name correctly, but it's a bit hacky currently. It will always assume that the bracketed entry is in the middle, so it will do something similar to your approach of either pasting together the first and second elements of the split (if no brackets exist) or the first and third elements of the split (if brackets exist). I may fiddle around with this more.

Note: the build of the updated package may fail. This is because of 502 errors (the NHM database may be down for a moment).