Open missinglink opened 3 years ago
🤔 Are these one-offs in the data set? Maybe we should ask the county to fix the data?
It's definitely uncommon in OA, at least I've never noticed it before. Within this one file happens a lot:
ogr2ogr -f CSV /vsistdout/ addrapn_datasd.dbf \
| awk -F, '{ if($4 && $4==$6) {print $0} }' \
| xsv count
3595
Looking at the source, it could also be that addrpdir
isn't what we think it is?
The post field is named addrpostd
, I would expect the pre to be called addrpred
but it's called addrpdir
🤷♂️.
It might still be a good idea to add some logic in machine to catch this
I think whenever the pre
and post
directional are identical it should always be considered an error?
Only one directional string should be added to the street
string in this case.
[edit] If I were to chose which one, I'd favour keeping the post
since it's much easier for consumers of the data to detect post-directionals than pre-directionals.
FWIW there are other logical errors in the San Diego geojson file, also because the source file is messy.
One thing I noticed is that machine inserts a space when the field is empty, so in these cases where there is no addrsfx
we see a double space.
cat us_ca_san_diego-addresses-county.geojson \
| jq -r '.properties.street' \
| grep -E '^[NSEW]\s.{1,3}\s\s[NSEW]$'
W E W
W E W
E AVE E
W E W
W E W
E AVE E
E AVE E
W E W
E AVE E
Heya,
I noticed a street name in the San Diego file today "S 39TH ST S" which has the "South" directional added twice:
It seems that the error is caused by the source data including both pre (
addrpdir
) and post (addrpostd
) directional columns with the value 'S':Would it be possible to add a check in machine which only adds one of these values to the
street
field when both are present?