Open riordan opened 8 years ago
I like this idea. My first thought about implementation might take the form of a set of named expected outputs with low-precision geolocations, e.g.:
{
…
"expected": {
"city hall":
{
"NUMBER": "1",
"STREET": "Dr Carlton B Goodlett Place",
"POSTCODE": "94102",
"LAT": 37.7793,
"LON": -122.4188
},
"one market":
{
"NUMBER": "1",
"STREET": "Market Street",
"POSTCODE": "94105",
"LAT": 37.7939,
"LON": -122.3949
}
}
}
I like the idea too! I was going to suggest something even simpler, like "has more than 1000 rows" and "has an address with Market Street in the name". @migurski's proposal allows for more precise tests but requires more typing.
One other possibility (though it wouldn't work for all sources) is identifying if there's a "Full Address" field and using it to test if all the components we've identified are there.
Some times source datafeeds break. With ESRI servers, it's pretty common for a layer, IDed by a number, to have its number reassigned, and for invalid address data to come back.
It would be great if there were a way to create a few valid tests for each data source as part of each source configuration; known addresses that should be in the dataset each time, and if they're missing or malformed, are to be investigated.
[Please feel free to rewrite this ticket once a stronger specification is agreed upon]