hassansin / parse-address

US Street Address Parser
http://hassansin.github.io/parse-address/
Other
157 stars 81 forks source link

added Zip4 tests #13

Closed yankeeinlondon closed 6 years ago

yankeeinlondon commented 6 years ago

All of the test cases you had setup seem to work as expected but there are some other patterns I'm expecting from users so I added some more tests to represent them. In general there seems to still be an issue with distinguishing a non dasherized/hyphenated representation of Zip5+4 (aka, 90291 3606)

yankeeinlondon commented 6 years ago

Sorry would have put in my attempted fix too but working in Ghana for next two weeks and my Internet is a bit dodgy.

yankeeinlondon commented 6 years ago

Looking at the RegEx, I was under the apparently false impression that changing:

zip: "(?<zip>\\d{5})-?(?<plus4>\\d{4})?",

to

zip: "(?<zip>\\d{5})[- ]?(?<plus4>\\d{4})?",

would pick up the failing syntax in the added test cases but it doesn't seem to.

yankeeinlondon commented 6 years ago

There's a certain amount of the parsing logic that I don't think I've fully grokked

hassansin commented 6 years ago

yeah, I'm thinking the same solution you guessed. It only fails to parse when there's only zip+4 code, other two test cases passes when zip is preceded by street/number parts. Validating only zip would be difficult and might have to change lots of the other regexps too.

yankeeinlondon commented 6 years ago

The use case I’m most interested in getting corrected though is that where someone states a Zip5+4 without the dash. That is a quite reasonable use case whereas I can’t see anyone using Zip4 without Zip5 preceding it.

hassansin commented 6 years ago

ZIP5+4 code passes if it's a part of longer address like S Wacker Dr 60606 6306. But it fails when you ONLY provide ZIP5+4 code without any other address parts like only 60606 6306. That's what I was talking about and supporting this would be difficult, IMO.

yankeeinlondon commented 6 years ago

I think that’s a reasonable constraint (although in my case just a zip5+4 is valid) but the pattern I was more concerned about was an address like so: “424 6th Ave, Venice, CA 90291 4444” ... in my test this did not match the zip4 component but did match the rest.

yankeeinlondon commented 6 years ago

FYI ... zip5+4 is enough resolution by itself to identify a congressional district uniquely whereas zip5 is not. Just a little context on the problem I’m trying to solve. :)

hassansin commented 6 years ago

New changes parses the zip4 component in my testing:

{
    number: '424',
    street: '6th',
    type: 'Ave',
    city: 'Venice',
    state: 'CA',
    zip: '90291',
    plus4: '4444'
  }

I'll take another stab to parse just a zip5+4, but no promises :). For now, we could just publish what we have to npm.

yankeeinlondon commented 6 years ago

many thanks