openaddresses / openaddresses-ops

Issues-only repo for discussion of operational considerations for OA
6 stars 5 forks source link

Add Validation of Addresses #26

Open kkkmail opened 4 years ago

kkkmail commented 4 years ago

There are many rules that a valid address should satisfy. However, some of the addresses in the data set are clearly not valid. What I suggest is as follows:

  1. Add an extra column or several columns to the data set to describe the data "quality" of the row.
  2. Build a post-processing engine, which would go over all addresses and apply various country / region / state / etc. specific rules to produce that quality score based on the given address and related data.
  3. I've done that using F# for similar 100M+ address-based data sets and I'd be glad to assist in setting that up here.
  4. Once the framework is setup, further rules can be added by experts who knows the particular details of countries / regions / states / etc...