openplans / openblock

OpenBlock is a web application and RESTful service that allows users to browse and search their local area for "hyper-local news
61 stars 26 forks source link

AddressGeocoder may raise InvalidBlockButValidStreet after finding a match #287

Open slinkp opened 11 years ago

slinkp commented 11 years ago

Opening this as a ticket since I'm not entirely sure of the fix. Ran across this when tracking down why what seemed to be a valid address for my install ("13063 Old Lake Rd") raised !InvalidBlockButValidStreet on a geocode attempt rather than finding the correct match. Within !AddressGeocoder do_geocode, parse comes up with 3 alternatives:

(Pdb) pprint.pprint(locations)
[{'number': u'13063', 'pre_dir': None, 'street': u'OLD', 'suffix': None, 'post_dir': None, 'city': u'LAKE', 'state': u'RD', 'zip': None},
 {'number': u'13063', 'pre_dir': None, 'street': u'OLD LAKE', 'suffix': 'RD', 'post_dir': None, 'city': None, 'state': None, 'zip': None},
 {'number': u'13063', 'pre_dir': None, 'street': u'OLD LAKE RD', 'suffix': None, 'post_dir': None, 'city': None, 'state': None, 'zip': None}]

The 2nd of these works: self._db_lookup(loc) for that one returns the correct block. However on the next iteration of the for loc in locations loop, self._db_lookup(loc) finds nothing and the code proceeds to try harder to get a match on this loc, by checking for misspellings, etc. One of the things it does is to try a block lookup on the street name, without the number. That block lookup in my case returns:

[<Block: 1-540 Old Lake Rd Ext.>]

The existence of this block that matches on name (OLD LAKE RD) but not number causes an !InvalidBlockButValidStreet exception to be raised at this point where the code is trying harder to find a match on loc. For this particular case I can "fix" the problem by only going down the "try harder" path if a match has not already been found in a previous loop iteration. But I'm not sure if that's a good fix...couldn't this path be encountered on an earlier iteration of the for loop, where a later iteration would be able to find a match via simple self._db_lookup(loc)?

The overall structure of "try all these alternatives" and then for each one "try a bit harder on this one by trying a few alternatives" makes me a bit uneasy. Besides this possibility of falsely raising !InvalidBlockBugValidStreet I'm wondering if this try harder stuff could be introducing spurious ambiguous results where the basic self._db_lookup(loc) succeeds on one of the alternatives but the "try harder" code also succeeds?

I'll attach a patch with test and possible fix, though as noted I'm not really sure about the fix.

slinkp commented 11 years ago

Thanks for the patch. Your analysis is right and coincidentally I had noticed this and tackled it while I was working on handling street prefixes, which I've only just merged to master branch. Now the InvalidBlockButValidStreet exception doesn't get raised till after the loop, and then only if no results were found. Can you please try updating and let me know if it works for your example case?

I didn't address the overall "try everything" code smell though. I'm sure the geocoder could use some deeper thought and work than I gave it, but unfortunately I am out of time and funds for openblock work so I can't do it myself.

One approach might be something like:

First do a loop without all the "try these alternatives" junk. During this pass keep track of which loc gave which results, and suppress exceptions. After this loop, if you have exactly one result, you're done.

If you got no results, you can do another loop doing the "try harder" stuff.

slinkp commented 11 years ago

Ticket imported from Trac: http://developer.openblockproject.org/ticket/294 Reported by: kmtracey