pelias / labels

Pelias Label generation
https://pelias.io
MIT License
4 stars 9 forks source link

add withOptional flag that will trigger region when needed #42

Open Joxit opened 3 years ago

Joxit commented 3 years ago

I added a commit above #41 that implement my comment https://github.com/pelias/labels/pull/41#issuecomment-773865266

What do you think @missinglink ? The PR for the API is in progress :wink:

missinglink commented 3 years ago

Heya, sorry I didn't reply earlier, I wanted to think about this a bit more before commenting.

I think that generally this is a great approach, namely:

One change I would suggest is around code organisation. The withOptional and meta.optional syntax don't feel very natural to me.

Just throwing out an idea, but could we change that so that each 'generator' (such as fr for instance) could have multiple sub-generators, so fr.verbose or fr.succinct for example.

I think that has a few advantages:

Not sure, does that make sense?

Joxit commented 3 years ago

If I understand correctly, the schema should look something like this ?


  'FRA': {
    'valueFunctions': {
      'local': getFirstProperty(['locality', 'localadmin']),
      'country': getFRACountryValue()
    },
    'verbose': {
      'valueFunctions': {
        'local': getFirstProperty(['locality', 'localadmin']),
        'regional': getFRARegionValue(),
        'country': getFRACountryValue()
      },
      'meta': {
        'separator': ',' // This is useless for FRA
      }
    },
    'meta': {
      'separator': ',' // This is useless for FRA
    }
  }

I like the term verbose, but I don't think it will be enough for what I have in mind.

I want to expand the best part of the label based on the results. That means:

  1. For Bagneux, France, I will expand the regional part

    Bagneux, Hauts-de-Seine, France
    Bagneux, Deux-Sèvres, France
    ...
  2. For Starbucks, New York, NY, USA, I will expand the street part

    Starbucks, 5th Avenue, New York, NY, USA
    Starbucks, 6th Avenue, New York, NY, USA
    ...
  3. For Police nationale, Marseille, France, I will expand the borough (maybe with street & only in France)

    Police nationale, 12e Arr., Marseille, France
    Police nationale, 16e Arr., Marseille, France
    ...

With an option like withOptional and verbose, the results for Starbucks an Police nationale will include the regional part which is not relevant because it is not discriminatory.


Maybe I will need metadata from the API to identify similar parts... Or pelias-label should take the result list...

:warning: MIDNIGHT DRAFT

Metadata example for Bagneux, France:

{
        "country": 2, // same name and same ID
        "macroregion": 0, // no match
        "region": 0, // no match
        "macrocounty": 0, // no match
        "county": 0, // no match
        "localadmin": 1, // same name
        "locality": 1, // same name
        "continent": 2, //same name and same ID
}

Metadata example for Starbucks, New York, NY, USA:

{
        "housenumber": 0, // no match
        "street": 0, // no match
        "postalcode": 0, // no match
        "country": 2,  //same name and same ID
        "region": 2,  //same name and same ID
        "county": 2,  //same name and same ID
        "locality": 2,  //same name and same ID
        "borough": 2,  //same name and same ID
        "neighbourhood": 0, // no match
        "continent": 2,  //same name and same ID
}
missinglink commented 3 years ago

hmm yeah sorry if I'm actually making this more complicated 😆 I just look at this code (even prior to this PR) and think there must be a better way to do it 🤔

Just throwing out an idea here but if we returned an array of objects from the generator, we could then pass it up to the API layer and let it make the decision about how to .join() it.

So off the top of my head, something like this:

var doc = thePeliasDocForBagneux()
var parts = generator.fr(doc)

/**
  parts would be equal to:

  [
    { label: 'Bagneux', layer: 'locality', role: 'required' },
    { label: 'Hauts-de-Seine', layer: 'region', role: 'optional' },
    { label: 'France', layer: 'country', role: 'required' },
  ]

**/

// Get the existing 'simple' label
var simple = (p) => ['required'].includes(p.role)
var str = parts.filter(simple).join(', ')

// Get the label with 'optional' parts
var extended = (p) => ['required','optional'].includes(p.role)
var str = parts.filter(extended).join(', ')

The generator would just be responsible for specifying the order of the array items, what is included or discarded from the array and what roles are assigned based on the document layer.

The stringification could be performed either within this repo or in the pelias/api codebase, whichever makes the most sense.

🤷‍♂️

Joxit commented 3 years ago

Hum... Yes and in this case, if we return an array to the API:

  1. We do the simple join
  2. We check duplicates
  3. On duplicates, we looking for the left most discriminatory optional label on the first two documents
  4. We recreate the label with required ones and the discriminatory label

We should not forget to return the separator character (space in Korea)

This is of course a breaking change

Joxit commented 3 years ago

Hi there, I updated this branch with https://github.com/pelias/api/pull/1507 to apply your idea @missinglink.

This is backward compatible, I added a new function partsGenerator that will return the new object:

{
  parts:   [
    { label: 'Bagneux', layer: 'locality', role: 'required' },
    { label: 'Hauts-de-Seine', layer: 'region', role: 'optional' },
    { label: 'France', layer: 'country', role: 'required' },
  ],
  separator: ', '
}

role has only two values, I don't know if one day we will use it with other values there or if we could simplify with a boolean required: true :shrug:

Anyway, the result looks very good !

:arrow_right: Police nationale, Marseille, France

Police nationale, 12th Arr., Marseille, France
Police nationale, 16th Arr., Marseille, France
Police nationale, 2nd Arr., Marseille, France
Police nationale, 3rd Arr., Marseille, France
Police nationale, 14th Arr., Marseille, France
Police nationale, 13th Arr., Marseille, France
Police nationale, 8th Arr., Marseille, France
Police nationale, 11th Arr., Marseille, France
Police nationale, 1st Arr., Marseille, France
Police nationale, 10th Arr., Marseille, France

:arrow_right: Bagneux, France

Bagneux, Hauts-de-Seine, France
Bagneux, Deux-Sèvres, France
Bagneux, Indre, France
Bagneux, Marne, France
Bagneux, Allier, France
Bagneux, Meurthe-et-Moselle, France
Bagneux, Aisne, France

:arrow_right: Starbucks, New York

Starbucks, 5th Avenue, New York, NY, USA
Starbucks, 6th Avenue, New York, NY, USA
Starbucks, 3rd Avenue, New York, NY, USA
Starbucks, West 181st Street, New York, NY, USA
Starbucks, 7th Avenue, New York, NY, USA
Starbucks, 1st Avenue, New York, NY, USA
Starbucks, East 93rd Street, New York, NY, USA
Starbucks, East 90th Street, New York, NY, USA
Starbucks, West 145th Street, New York, NY, USA
Starbucks, West 23rd Street, New York, NY, USA
Joxit commented 2 years ago

Hello there, I did an update of this PR

Now I have two questions. Do you think this PR is still interesting for the project? Otherwise I will stop updating it and close it.

I saw that there were spaces in Japanese labels. Maybe this PR is a perfect use case to handle this? https://github.com/pelias/labels/blob/73ef845cfc09e19ff6aa40ed2418995da7bd0d15/builders/JPN-JPN.js#L73-L75

let labelParts = _.concat(postalcode, { label: ' ', role: 'newline' }, admin, distric, block, { label: ' ', role: 'newline' }, venue); 
// OR
let labelParts = _.concat(postalcode, { label: ' ', role: 'separator' }, admin, distric, block, { label: ' ', role: 'separator' }, venue); 

Also, the space seems to be ignored when the label is built: https://github.com/pelias/labels/blob/73ef845cfc09e19ff6aa40ed2418995da7bd0d15/test/labelGenerator_JPN_JPN.js#L137-L142

Joxit commented 2 years ago

I added some minor fixes, now this is ready to review/merge

orangejulius commented 2 years ago

We really need to test this out, you've done a lot of work here. Stand by :)