pelias / parser

natural language classification engine for geocoding
https://parser.demo.geocode.earth
MIT License
55 stars 27 forks source link

Classifying more addresses for norway #171

Closed mansoor-sajjad closed 1 year ago

mansoor-sajjad commented 1 year ago

:wave: Adding more street types and directionals and activating the CompoundStreetClassifier and DirectionalClassifier for Norway.


Here's the reason for this change :rocket:

We have a national data set of addresses containing almost 2.5 million addresses. We have found that pelias-parser is able to classify most of them as addresses. 💯 There are almost 120,000 addresses not classified as addresses and as result address layer is filtered out by the pelias-api.

Norway has a lot of compound street names. And CompoundStreetClassifier is not activated for the Norwegian dictionary(nb). By just activating the CompoundStreetClassifier for Norway we will be able to classify almost 52000 more addresses. 🎉

Norwegian addresses also have the directional tokens. We have activated the DirectionalClassifier and added some more directional token in Norwegian dictionary(nb). In addition to that we have extended the list of street_types. This helps in classifying the almost additional 20000 addresses. 🎉

We still have almost 50000 addresses in norway which the pelias-parser fails to classify as addresses. 😞 Created the following separate issues for discussing solution to those addresses. Option to do the address parsing for a specific country Option to not use the WhosonFirstClassifier for AddressParser