koursaros-ai / nboost

NBoost is a scalable, search-api-boosting platform for deploying transformer models to improve the relevance of search results on different platforms (i.e. Elasticsearch)
Apache License 2.0
675 stars 69 forks source link

Complex query #24

Closed robinalexandre closed 4 years ago

robinalexandre commented 4 years ago

Hello there,

Firstly, thank you for your work.

I got a question about complex query. I looked into es.py in codex folder and we can see you are looking for body['query']['match'] or body['query']['match']['query'] to find the query.

My question is, for a complex query like below, is it possible to use nboost ? Because when I tried it, I think my query is just proxied to ElasticSearch without post process.

{
  "size": 11,
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "should": [
            {
              "match": {
                "text": {
                  "query": "fréquenc recyclag format conducteur professionnel",
                  "operator": "and"
                }
              }
            },
            {
              "match": {
                "text": {
                  "query": "fréquenc recyclag format conducteur professionnel",
                  "operator": "or"
                }
              }
            }
          ]
        }
      },
      "script_score": {
        "script": {
          "source": "1 + ((5 - doc[\"priority\"].value) / 10.0) + ((doc[\"branch\"].value == \"All\") ? 0.5 : 0)"
        }
      }
    }
  }
}

Thanks,

Alexandre

pertschuk commented 4 years ago

No - we currently only support simple queries. It seemed like a rabbit hole to try and support every ES query format.

However, it would be simple to merely recurse down to the child "query" key in the python dict, and return that value as the query, with some function like this:

def _finditem(obj, key):
    if key in obj: return obj[key]
    for k, v in obj.items():
        if isinstance(v,dict):
            item = _finditem(v, key)
            if item is not None:
                return item

You could first find the 'match' key perhaps, and then pass that back through to find the 'query' key that is a child. I would encourage you to fork and submit a PR if you think you have a generalizable solution.

Thanks!

colethienes commented 4 years ago

I just added support for dynamically selecting fields to rerank. You can use the --choices_path, --cids_path and --cvalues_path to select choices dynamically. This works via jsonpath. For example, your case would be need --choices_path body.query.function_score.query.bool.should.[*].match. you also need to set the choice ids path and choice values path as noted above.