spraakbanken / karp-backend

Karp backend
MIT License
3 stars 2 forks source link

Freetext only searching in non-nested fields #288

Closed majsan closed 2 months ago

majsan commented 2 months ago

With the update that introduced type="nested" for all objects in collections, freetext stopped working.

The correct ES query must add a clause for each nest-level and put the multi-match query inside, for example:

{
    "query": {
        "bool": {
          "should": [
            {
              "multi_match": {
                "query": "granska",
                "fields": [
                    "*"
                ],
                "lenient": true
              }
            },
            {
              "nested": {
                "path": "SOLemman",
                "query": {
                  "nested": {
                    "path": "SOLemman.lexem",
                    "query": {
                        "multi_match": {
                            "query": "granska",
                            "fields": [
                                "SOLemman.lexem.*"
                            ],
                            "lenient": true
                        }
                    }
                  }
                }
              }
            }
          ]
        }
    }
}
majsan commented 2 months ago

Fixed.

Generates unnecessarily many clauses becausemapping_repo does not know about the hierarchy right now, only flat fields. For example if fields a, a.b and a.c are nested, this will occur, wrapped in a boolean should:

"nested": {
  "path": "a",
  "query": {
    "nested": {
      "path": "a.b",
       "query": { match query for a.b.*}
    }
  }
},
"nested": {
  "path": "a",
  "query": {
    "nested": {
      "path": "a.c",
       "query": { match query for a.c.*}
    }
  }
}

But the bool should query could be inside the nested query for a. Not sure how this affects performance.