pelias / model

Pelias data models
6 stars 17 forks source link

add deduplication post processing script #118

Closed missinglink closed 5 years ago

missinglink commented 5 years ago

I was inspecting some raw elasticsearch results today and noticed that there is a lot of duplication in the WOF data for the name.default field.

This PR adds deduplication of these arrays using a fancy new postprocessing script.

eg.

"default": [
"New York",
"New York"
]

"default": [
"New York Mills",
"New York Mills",
"New York Mls"
]
orangejulius commented 5 years ago

Makes sense, I'm actually working through some WOF code and keep seeing stuff like this:

    var expected = [
      {
        id: 12345,
        name: 'label:spa_x_preferred_longname value',
        name_aliases: [
          'label:spa_x_preferred_longname value',
          'label:eng_x_preferred_longname value'
        ],
...

So that could be where this is coming from.