openplans / openblock

OpenBlock is a web application and RESTful service that allows users to browse and search their local area for "hyper-local news
61 stars 26 forks source link

Address extraction code is impossible to work on #128

Open slinkp opened 12 years ago

slinkp commented 12 years ago

The regular expression that ebdata.nlp.addresses uses to find addresses is actually a 100-line regex into which a 130-line regex is inserted 11 times. The final regex is over 1500 lines long.

It is almost impossible to debug, fix, or extend this regex. We need to re-think the address extraction approach completely. Investigating whether there is existing natural-language work we can leverage.

slinkp commented 12 years ago

Ticket imported from Trac: http://developer.openblockproject.org/ticket/128 Reported by: slinkp