pelias / parser

natural language classification engine for geocoding
https://parser.demo.geocode.earth
MIT License
55 stars 28 forks source link

update person schema to allow middle name, add brazil testcase #99

Closed missinglink closed 4 years ago

missinglink commented 4 years ago

Hi @blackmad

I had another look at this after our call (and after eating dinner) and managed to get it working with the following changes.

@Joxit does this look ok to you? it didn't break any tests.

closes https://github.com/pelias/parser/issues/98

missinglink commented 4 years ago

Maybe I should add a new scheme rather than editing the existing one?

missinglink commented 4 years ago

Another way of doing it is to revert this and instead add:

  {
    // Rua Raul Leite Magalhães
    confidence: 0.81,
    Class: StreetClassification,
    scheme: [
      {
        is: ['StreetPrefixClassification'],
        not: ['StreetClassification', 'IntersectionClassification']
      },
      {
        is: ['GivenNameClassification', 'SurnameClassification'],
        not: ['StreetClassification', 'IntersectionClassification']
      },
      {
        is: ['GivenNameClassification', 'SurnameClassification'],
        not: ['StreetClassification', 'IntersectionClassification']
      },
      {
        is: ['GivenNameClassification', 'SurnameClassification'],
        not: ['StreetClassification', 'StreetPrefixClassification']
      }
    ]
  },
Joxit commented 4 years ago

Hi @missinglink, Raul Leite Magalhães is a person so you should update classifier/scheme/person.js instead of classifier/scheme/street.js.

Add the case 1 GivenName with 2 Surnames. This will classify Raul Leite Magalhães as a person and then https://github.com/pelias/parser/blob/dac59f4c999e82bd5747fa0f03aad42bb3ad360a/classifier/scheme/street.js#L20-L33 will do the job

Joxit commented 4 years ago

Updating person.js will also provide support for inputs like Saint Raul Leite Magalhães

missinglink commented 4 years ago

Nice, I totally forgot how all this works already :P

Joxit commented 4 years ago

Oh, another solution is to add just after the case GivenNameClassification

  {
    // Leite Magalhães
    confidence: 0.25,
    Class: SurnameClassification,
    scheme: [
      {
        is: ['SurnameClassification'],
        not: ['StreetClassification', 'IntersectionClassification']
      },
      {
        is: ['SurnameClassification'],
        not: ['StreetClassification', 'StreetPrefixClassification', 'StopWordClassification']
      }
    ]
  }

This will reclassify Leite Magalhães as Surname and voilà :smile:

missinglink commented 4 years ago

Fixed via rebase.