Closed aahmad closed 9 years ago
This is an inherent limitation of the parsing method. I ported the same parser to Python, and have roughly the same class of problems. Basically, you can parse about 95% of US addresses with simple regular expressions. The remaining 5% require a much greater effort.
Proper US street parsing, per USPS rules, is right to left, bottom to top. There's less ambiguity towards the end of the address. That has more hope of working without a full street name database. Working back to front, once you've seen street type, a second street type is probably an error. But there are exceptions in Salt Lake City and parts of Brooklyn.
@John-Nagle is correct. The approach take in this code makes it hard to deal with the issue you list. I don't have time and/or enough brain power to rethink this code base from the ground up to solve. If you can fix it I'll accept a pull request.
If there is a street name as:
Wells is a street name which is also a key in the
STREET_TYPES
hash (the same is true for any street name as a key to that hash).