pelias / openaddresses

Pelias import pipeline for OpenAddresses.
MIT License
51 stars 43 forks source link

0 house numbers #17

Closed missinglink closed 8 years ago

missinglink commented 9 years ago

I'm not sure if these are useful or simply a relic of the OA schema enforcing a housenumber:

http://pelias.compare.s3-website-eu-west-1.amazonaws.com/#/search%3Fsize=4&input=0%20west%204th%20and%20oak%20street

0 west 4th and oak street

1)  0 West 4th & Oak Street, Anniston, AL
2)  0 West 4th Avenue West, Escondido, CA
3)  0 West 4th Avenue, Holly Grove, AR
4)  0 West 4th Street, El Dorado, AR

@riordan thoughts?

riordan commented 9 years ago

0-addresses are very real and very much a pain in the ass. this example taken blatantly from "Fasehoods Programmers Believe About Addresses"

As much of an edge case as it is, we have no choice but to trust it as valid until proven otherwise, and even then only for that instance.

Part of me is kinda happy that they're zero indexing though.

On Wed, Jul 1, 2015 at 6:52 AM Peter Johnson a.k.a. insertcoffee < notifications@github.com> wrote:

I'm not sure if there are useful or simply a relic of the OA schema enforcing a housenumber:

http://pelias.compare.s3-website-eu-west-1.amazonaws.com/#/search%3Fsize=4&input=0%20west%204th%20and%20oak%20street

0 west 4th and oak street

1) 0 West 4th & Oak Street, Anniston, AL 2) 0 West 4th Avenue West, Escondido, CA 3) 0 West 4th Avenue, Holly Grove, AR 4) 0 West 4th Street, El Dorado, AR

@riordan https://github.com/riordan thoughts?

— Reply to this email directly or view it on GitHub https://github.com/pelias/openaddresses/issues/17.

missinglink commented 9 years ago

right, but I assumed that they are very very uncommon?

I suspect that having 2x 0 West 4th Avenue in 2x different states indicates a data issue?

missinglink commented 9 years ago

needs an investigation, I'm pulling this in-progress and self-assigning for a look-see

 1) 0 South, Woodstock, New Brunswick
 2) 0 South, New Brunswick
 3) 0 South, New Brunswick
 4) 0 Southers, New Brunswick
 5) 0 South, New Brunswick
 6) 0 South, St. George, New Brunswick
 7) 0 South, Doaktown, New Brunswick
 8) 0 South, Saint John, New Brunswick
 9) 0 South, Big Bear City, CA
10) 0 South, San Bernardino County, CA
11) 0 South, New Brunswick
12) 0 South Road, North Frontenac, Ontario
13) 0 South Drive, Mountain View, CA
14) 0 South Street, Morro Bay, CA
15) 0 South Road, North Frontenac, Ontario
16) 0 South Street, San Bernardino County, CA
17) 0 South Street, Macon, GA
18) 0 South Street, Raleigh, NC
19) 0 South Road, North Frontenac, Ontario
20) 0 South Street, Morristown, NJ
21) 0 South Avenue, Felsenthal, AR
22) 0 South Road, San Bernardino County, CA
23) 0 South Street, Parma, ID
24) 0 South Drive, Big Bear City, CA
25) 0 South Street, Adelanto, CA
26) 0 South Alley, Anniston, AL
27) 0 South Road, San Bernardino County, CA
28) 0 South Street, Ashley, OH
29) 0 South Lane, Crest, CA
30) 0 South Avenue, Wake Forest, NC
31) 0 South Road, Adelanto, CA
32) 0 South Street, German, OH
33) 0 South Avenue, Parma, ID
34) 0 South Street, Windsor, Ontario
35) 0 South Road, North Frontenac, Ontario
36) 0 South Street, Brunswick, GA
37) 0 South B Street, Livingston, MT
38) 0 South Grove Street, Ashley, OH
39) 0 South F Street, Carbon County, MT
40) 0 South K Street, Livingston, MT
riordan commented 9 years ago

They are. Upon further investigation, lots of these don't exist. Not sure if the OpenAddresses folks are even aware.

Opening a ticket on their end.

On Wed, Jul 1, 2015 at 12:41 PM, Peter Johnson a.k.a. insertcoffee < notifications@github.com> wrote:

needs an investigation, I'm pulling this in-progress and self-assigning for a look-see

1) 0 South, Woodstock, New Brunswick 2) 0 South, New Brunswick 3) 0 South, New Brunswick 4) 0 Southers, New Brunswick 5) 0 South, New Brunswick 6) 0 South, St. George, New Brunswick 7) 0 South, Doaktown, New Brunswick 8) 0 South, Saint John, New Brunswick 9) 0 South, Big Bear City, CA 10) 0 South, San Bernardino County, CA 11) 0 South, New Brunswick 12) 0 South Road, North Frontenac, Ontario 13) 0 South Drive, Mountain View, CA 14) 0 South Street, Morro Bay, CA 15) 0 South Road, North Frontenac, Ontario 16) 0 South Street, San Bernardino County, CA 17) 0 South Street, Macon, GA 18) 0 South Street, Raleigh, NC 19) 0 South Road, North Frontenac, Ontario 20) 0 South Street, Morristown, NJ 21) 0 South Avenue, Felsenthal, AR 22) 0 South Road, San Bernardino County, CA 23) 0 South Street, Parma, ID 24) 0 South Drive, Big Bear City, CA 25) 0 South Street, Adelanto, CA 26) 0 South Alley, Anniston, AL 27) 0 South Road, San Bernardino County, CA 28) 0 South Street, Ashley, OH 29) 0 South Lane, Crest, CA 30) 0 South Avenue, Wake Forest, NC 31) 0 South Road, Adelanto, CA 32) 0 South Street, German, OH 33) 0 South Avenue, Parma, ID 34) 0 South Street, Windsor, Ontario 35) 0 South Road, North Frontenac, Ontario 36) 0 South Street, Brunswick, GA 37) 0 South B Street, Livingston, MT 38) 0 South Grove Street, Ashley, OH 39) 0 South F Street, Carbon County, MT 40) 0 South K Street, Livingston, MT

— Reply to this email directly or view it on GitHub https://github.com/pelias/openaddresses/issues/17#issuecomment-117743774 .

riordan commented 9 years ago

There's quite a few null entries in these datasets. In all likleyhood, it's bad outside of their control and is likely from whatever GIS system exported it in the first place. Mixed zeroes and nulls appear here

On Wed, Jul 1, 2015 at 4:06 PM, Dave Riordan dave.riordan@mapzen.com wrote:

They are. Upon further investigation, lots of these don't exist. Not sure if the OpenAddresses folks are even aware.

Opening a ticket on their end.

On Wed, Jul 1, 2015 at 12:41 PM, Peter Johnson a.k.a. insertcoffee < notifications@github.com> wrote:

needs an investigation, I'm pulling this in-progress and self-assigning for a look-see

1) 0 South, Woodstock, New Brunswick 2) 0 South, New Brunswick 3) 0 South, New Brunswick 4) 0 Southers, New Brunswick 5) 0 South, New Brunswick 6) 0 South, St. George, New Brunswick 7) 0 South, Doaktown, New Brunswick 8) 0 South, Saint John, New Brunswick 9) 0 South, Big Bear City, CA 10) 0 South, San Bernardino County, CA 11) 0 South, New Brunswick 12) 0 South Road, North Frontenac, Ontario 13) 0 South Drive, Mountain View, CA 14) 0 South Street, Morro Bay, CA 15) 0 South Road, North Frontenac, Ontario 16) 0 South Street, San Bernardino County, CA 17) 0 South Street, Macon, GA 18) 0 South Street, Raleigh, NC 19) 0 South Road, North Frontenac, Ontario 20) 0 South Street, Morristown, NJ 21) 0 South Avenue, Felsenthal, AR 22) 0 South Road, San Bernardino County, CA 23) 0 South Street, Parma, ID 24) 0 South Drive, Big Bear City, CA 25) 0 South Street, Adelanto, CA 26) 0 South Alley, Anniston, AL 27) 0 South Road, San Bernardino County, CA 28) 0 South Street, Ashley, OH 29) 0 South Lane, Crest, CA 30) 0 South Avenue, Wake Forest, NC 31) 0 South Road, Adelanto, CA 32) 0 South Street, German, OH 33) 0 South Avenue, Parma, ID 34) 0 South Street, Windsor, Ontario 35) 0 South Road, North Frontenac, Ontario 36) 0 South Street, Brunswick, GA 37) 0 South B Street, Livingston, MT 38) 0 South Grove Street, Ashley, OH 39) 0 South F Street, Carbon County, MT 40) 0 South K Street, Livingston, MT

— Reply to this email directly or view it on GitHub https://github.com/pelias/openaddresses/issues/17#issuecomment-117743774 .

missinglink commented 9 years ago

@riordan I can't find the upstream ticket for this, did we file one with them in the end?

vesameskanen commented 8 years ago

I recently bumped into the this problem with certain Finnish data. It seems to me that OpenAddresses is pretty heavily 'street address' oriented. Data sources, which include also POI/place name data etc. list them as a fake street + zero index. So, many of the listed 'streets' do not exist at all, they are just place names. Unfortunately, the wrong index ends up into the final label, displayed to the user; imagine a search result like '0 Statue of Liberty' - not good. I am planning to implement an option to treat such names properly in the OA importer. Basically, 0 index would generate a venue, not an address, at least for defined countries.