Open davidbuhler-zz opened 8 years ago
hey david, thanks for your help. it's sorta hilarious, I've been working on a new version that supports just this sort of syntax-matchy-stuff. It's not ready yet, but this soon should be like:
nlp(myText).match('#Value #Street? #City #Country?').tag('#Address')
or something like that. I'll add your street designations, and some kinda Street tag, or something like that - any thoughts? cheers.
how has the performance for address resolution been so far? it would be cool to bake-in the logic for pulling-out parsed numbers, postal/zip codes, etc..
I think the performance is tricky for all entity extraction and I can only speculate how it works in CoreNLP and Gate without looking more deeply..
When I looked at GATE and JAPE for Place/Location address extraction, I realized there are a lot of permutations for Address Matching, and GATE really focuses on solving UK address matching.
Gate/Jape seems to add to the Address if patterns exist, working in order of priority. Address > Object
If (PO Box) > add if City exists if (Street Number near street suffix) > add if City exists if (Street Number near street abbreviation) > add if City exists if (Postal Code) > add if Province/State exists
The cities might need to be dictionaries added for each State/Province mentioned. I think the patterns would have to be driven by Country, which in turn, has to be driven by State/Province look-up (since most people leave the country out of context when conversing).
NLPC would need a property for the nearby tokens to perform a lookup. I think\ the most efficient way to address the problem is to only perform a proximity lookup on strings if a common State/Province is mentioned, but I can't think of how to flag a State/Province as a likely "Place" in a given type of context, which would speed things up quite a bit.
For example, Rule: Only perform State/Province look-up if State/Province is preceded by "at" or "in" or "from" and State/Province is capitalized.
@spencermountain when I try "Atlanta" or "Marietta".
nlp.debug() does not recognize them as place. How can I add them to the dictionary?
-- 'Phoenix' - TitleCase, City, Place, Singular, Noun, ProperNoun 'AZ' - Noun, Acronym, Singular, Region, Place, ProperNoun 'atlanta' - Noun, Singular 'georgia' - Region, Place, Singular, Noun, ProperNoun 'marietta' - Noun, Singular
nm, got it.
let doc = nlp('Phoenix AZ atlanta georgia marietta', {Atlanta: 'Place', Marietta: 'place'}); doc.debug();
Actually is this the optimal way of doing it?
@spencermountain what cities are included in the library? Where can I get that list?
@playground please look around before asking. it's pretty easy to find! ./data/words/places/cities.js
Entity extraction should include address extraction when the entity is a place.
I believe we can omit the need to use a dictionary of street names if there is a pattern match for (number)(string)(comma)(optional string or number)(city).