Open GoogleCodeExporter opened 8 years ago
I'm having trouble understanding what form this would take. Is it some user
defined set of rules (and associated rule language) or ...
Is the % a probability that an value is correct or is it the percentage of
values which are valid (implying that Refine would have to be able to discern
valid/invalid with 100% accuracy).
If you could expand on what your envisioning, that might help developers figure
out how hard it would be to implement and whether it fits with the goals of
Refine.
Original comment by tfmorris
on 25 May 2011 at 5:29
EHOPstore, you might be further interested or have an investment interest in
using some of the USPS Address Verification & Address Quality solutions at
http://www.usps.com/business/addressverification/welcome.htm or contact them
directly like I have done in the past:
http://www.usps.com/ncsc/ziplookup/contactinfo.htm At my job we have a Talend
ETL process at night that scrubs one of our databases against one of those
vendor software packages (CASS Certified) and we review using our AEC and even
clean sometimes manually with simple tools, including Refine at times. You
might also look at Orange http://orange.biolab.si and perhaps try a learner and
classifier solution to the problem, if the data set sample is large enough to
support predictions. (just ask them for help on their forum). Your probably
looking for something like a custom reconciliation service to use in Refine
that would utilize a CASS certified vendor address verification (or if you
don't need a solution to be CASS certified, then perhaps alternatively a Google
Maps API Premier license or another web api factory out there) Outside of what
I just mentioned, you can certainly do a lot with just Faceting, Splitting, and
Crossing between project data sets as demonstrated here:
http://feedproxy.google.com/~r/ouseful/~3/yCUHpNJghxo/
Original comment by thadguidry
on 26 May 2011 at 4:01
Original issue reported on code.google.com by
EHOPstore@gmail.com
on 19 May 2011 at 7:28