Open SPTKL opened 5 years ago
Usaddress is good. The purpose of this lib is for a lighter weight solution that is specific to NYC addresses and its edge cases. It may make more sense to use usaddress depending on your needs.
As for this particular issue, I am making an assumption that all phns and street names are separated with white space. The issue is here: https://github.com/ishiland/nyc-parser/blob/fc79d8127b85da9f07c5fa47cc0372eb70361b37/nycparser/nycparser.py#L29-L30
Does usaddress successfully parse this address? If are able to provide test data for issues like these that would be very helpful, thanks!
Actually usaddress is failing on this address too, it is labeling REAR as part of the street name
sometimes it would correctly label rear
or front
as address number suffixes. I think I'm going to train a better usaddress model using the PAD data. or we can just create some kind of exception for rear
and front
, but then we have tricky cases like below
141 FRONT MOTT STREET, Manhattan, New York, NY, USA
and
141 FRONT STREET, Manhattan, New York, NY, USA
Those seem like difficult scenarios to account for. I'd be interested in knowing if you have any success using the PAD data to train usaddress. If not, we can work in some kind of solution with nyc-parser. Perhaps I should write some better tests with the PAD data.
Hey @ishiland this is an awesome package, we've been using https://github.com/datamade/usaddress for address parsing (but we have problems when parsing cross streets or place names), but would love to adopt your nyc-parser one issue I found is that house numbers containing letters gets wrong parsing results. e.g.
should be instead