pelias / schema

elasticsearch schema files and tooling
MIT License
40 stars 76 forks source link

add peliasDefaultSimilarity #430

Closed missinglink closed 4 years ago

missinglink commented 4 years ago

This PR adds a peliasDefaultSimilarity similarity module and configures it with the defaults for BM25 (making this a no-op).

curl -s 'localhost:9200/_settings' | jq '.pelias.settings.index.similarity'
{
  "peliasDefaultSimilarity": {
    "type": "BM25",
    "b": "0.75",
    "k1": "1.2"
  }
}

The advantage of using a custom similarity over using the default (which is also BM25) is that the values of k1 and b can be configured.

One way of changing the BM25 settings is via editing pelias.json before creating the index as such:

{
  "elasticsearch": {
    "settings": {
      "index": {
        "similarity": {
          "peliasDefaultSimilarity": {
            "k1": 0,
            "b": 0
          }
        }
      }
    }
  }
}
curl -s 'localhost:9200/_settings' | jq '.pelias.settings.index.similarity'
{
  "peliasDefaultSimilarity": {
    "type": "BM25",
    "b": "0",
    "k1": "0"
  }
}

Alternatively it seems possible to avoid reindexing an existing index by simply closing the index, changing the setting and reopening it:

curl -sX POST 'localhost:9200/pelias/_close'

curl -sX PUT -H 'Content-Type: application/json' 'localhost:9200/pelias/_settings' -d '{
  "index": {
    "similarity": {
      "peliasDefaultSimilarity": {
        "type": "BM25",
        "k1": 0,
        "b": 0
      }
    }
  }
}'

curl -sX POST 'localhost:9200/pelias/_open'

note: it's not safe to change between disparate similarity modules, changing these two BM25 settings is ok, but changing anything which modifies what gets stored in the index will obviously require a full reindex, see the link above for more info.

This PR is, by default, a no-op, but allows us to experiment with the k1 and b settings.

connected to https://github.com/pelias/schema/issues/408

missinglink commented 4 years ago

I also made some unrelated changes to settings.js: https://github.com/pelias/schema/pull/430/files#diff-2d78dada12f5fa4f3183bc45f30f590c, @orangejulius could you please spot check this for me?