hurlbertlab / dietdatabase

Creative Commons Zero v1.0 Universal
10 stars 9 forks source link

replacewith names not being recognized as valid when using clean_all_names() #91

Closed pwinner1 closed 6 years ago

pwinner1 commented 6 years ago

The 'replacewith' Scientific Names in the 'bad_names_replacement.txt' file should nearly be all correct, now that I've gone back through and verified them. However, when clean_all_names() is used on 'AvianDietDatabase_fixed.txt' (which has the new 'replacewith' names), the resulting 'AvianDietDatabase_fixed_badnames.txt' file is nearly as long with most of the names from "AvianDietDatabase_badnames.txt" appearing again.

Potentially the problem is shown when 'Planolinus tenellus' is copied from the 'AvianDietDatabase_fixed_badnames.txt' file and pasted into an ITIS search, as it says no data was found. If you type out 'Planolinus tenellus', it returns the result showing it as a valid name.

I don't know if this is format issue or some character problem attached to the spreadsheet values as to why it's not reading the name properly or if I am mistaken somewhere in the process.

ahhurlbert commented 6 years ago

Yes, there is a weird space character between the genus and species names for many of these problem names. I have no idea how they got introduced, but I've replaced them all with normal spaces in the AvianDietDatabase.txt and AvianDietDatabase_badnames.txt files.