Open trescube opened 8 years ago
@stephenkhess I noticed this in a lot of the Georgia sources that were recently added. I would say that these are all likely pulled in directly from the source data and represent artifacts introduced by the data owners and not ones introduced by any processing done by machine. They likely represent missing street numbers and should be empty instead of 0
. It's another good example of something that could be handled by some future QA.
See https://github.com/openaddresses/openaddresses/pull/1251
The top 10 states for 0
house numbers are:
State | Count |
---|---|
ca | 541730 |
ga | 281365 |
tx | 192020 |
ma | 131522 |
al | 86555 |
wa | 55452 |
ok | 37299 |
nv | 36700 |
id | 29204 |
fl | 25492 |
I think this is the source data. Looking at San Francisco for example, I see 1,661 rows with 0.0 ADDR_NUM
values (they are stored as floating point). By way of contrast, Alameda County stores its ST_NUM
values as strings, and contains no zeros.
So, I think this might be a mistake on the part of the data publisher. Should we treat numeric zeros as missing values? @stephenkhess is there enough knowledge about what’s valid in a place like California to add information to sources that would treat zeros as missing data?
@feomike provided some insight on this after our phone call with @iandees and Mapzen folks earlier in the fall. I’m going to open up a discussion in ops, to see if there’s any consistency of opinions on this.
In the latest US/CA data, there are a lot of addresses with either 0 or blank house numbers. I may be wrong but in my experience with street geocoding at MapQuest, "0" is not a valid house number in the US/CA (though I've read that "0" is valid in some European countries). Are these supposed to represent streets in general and not a particular house number or are these bad data?
0