karthik / testdat

A package to run unit tests on tabular data
142 stars 20 forks source link

Identify mangled strings #14

Open davharris opened 10 years ago

davharris commented 10 years ago

OpenRefine can identify lots of annoying cases where strings are spelled in different ways. I'm sure other people have thought hard about this, but I'd be willing to take a naive shot at it.

Here are a few ideas. I hope you all can make some suggestions as well.

karthik commented 10 years ago

This sounds great. We've got an installable package now and there are some test datasets in the local folder to test against. I can find more crappier datasets to put things through.