pelias / parser

natural language classification engine for geocoding
https://parser.demo.geocode.earth
MIT License
55 stars 27 forks source link

Option to do the address parsing for a specific country #172

Open mansoor-sajjad opened 1 year ago

mansoor-sajjad commented 1 year ago

The way the pelias-parser Classifiers works is, that they take all the configured tokens for the configured countries and apply them to given address string irrespective of which country the address belong to.

For example the CompoundStreetClassifier reads in the tokens for the following countries:

libpostal.load(this.suffixes, ['de', 'nl', 'sv', 'nb'], 'concatenated_suffixes_separable.txt')
libpostal.load(this.suffixes, ['de', 'nl', 'nb'], 'concatenated_suffixes_inseparable.txt')

The problem here is that if we have define a street type for Denmark (de), which is not the a valid street type for Norway (nb), the classifier will still try to classify the Norwegian addresses with the Danish Street types and the other street types defined for other countries.

So for example we want to add land and lien as valid street types for Norway, but will then it will affect other countries addresses, like the following unit test for French address fails when adding 'lien' as street type for Norway.

address FRA: Rue de l'Empereur Julien Paris
      expected: |-
        [ { street: 'Rue de l\'Empereur Julien' }, { locality: 'Paris' } ]
      actual: |-
        [ { street: 'Rue de l\'Empereur' }, { locality: 'Julien' } ]

This works best when our scope is full earth, where we don't know in advance which country the address belongs to. But in our case for example we know that we have norwegian addresses, so it would be nice if the classifier can be optionally configured to use only the Norwegian street types and not from the other countries.

Solution

The solution is obviously be able to configure the countries we want the parser to work on. We can use the options parameter in parser/Parser.js to send in the configuration. And in case of pelias-api, we can add the configuration option in the pelias-config, which pelias-api can send further into the pelias-parser.