OpenRefine can identify lots of annoying cases where strings are spelled in different ways. I'm sure other people have thought hard about this, but I'd be willing to take a naive shot at it.
Here are a few ideas. I hope you all can make some suggestions as well.
Detect inconsistent capitalization
Detect abbreviated species names (e.g. H. sapiens)
This sounds great. We've got an installable package now and there are some test datasets in the local folder to test against. I can find more crappier datasets to put things through.
OpenRefine can identify lots of annoying cases where strings are spelled in different ways. I'm sure other people have thought hard about this, but I'd be willing to take a naive shot at it.
Here are a few ideas. I hope you all can make some suggestions as well.