openaddresses / machine

Scripts for running OpenAddresses on a complete data set and publishing the results.
http://results.openaddresses.io/
ISC License
97 stars 36 forks source link

Gracefully handle source regressions #772

Open pnoll1 opened 4 years ago

pnoll1 commented 4 years ago

Machine accepts bad data from source making it the last successful run and what gets packaged in the data downloads.

Example:Franklin County WA  28,619 addresses to 37

Marking the run as failed and returning the previously cached result would be much better for data consumers.

iandees commented 4 years ago

Machine doesn't know that it's bad data, but it could check to see if the row count changed significantly and flag it as an error. I think the system that @ingalls is working on to replace machine should provide for this.

pnoll1 commented 4 years ago

Is there a roadmap or public code to track progress on this?

ingalls commented 4 years ago

Yes, you can see the new processing system here: https://github.com/openaddresses/batch