inspirehep / inspire-next

The INSPIRE repo.
https://inspirehep.net
GNU General Public License v3.0
59 stars 69 forks source link

Google-style search syntax #682

Open jmartinm opened 8 years ago

jmartinm commented 8 years ago

After the agreement in https://github.com/inspirehep/inspire-next/issues/609 google-style syntax should be implemented.

{
    "multi_match" : {
        "message" : {
            "query" : "{}",
            "operator" : "or",
            "zero_terms_query": "all",
            "fields": ["title^3", "title.raw^10", "abstract^2", "abstrac.raw^4", "author^10", "author.raw^15", "reportnumber^10", "eprint^10", "doi^10", ...]
        }
    }
}
Panos512 commented 8 years ago

Regarding the extra thing if we add "explain": true to the query we get an explanation of the results including the words that matched.

e.g.

For the query photomultiplier banana:

GET hep/record/_search
{
    "query": {"multi_match": {"query": "photomultiplier  banana", "fields": ["title^3", "title.raw^10", "abstract^2", "abstract.raw^4", "author^10", "author.raw^15", "reportnumber^10", "eprint^10", "doi^10"], "zero_terms_query": "all"}},
    "explain": true
}

The explanation of the first record match is:

"_explanation": {
               "value": 0.18039167,
               "description": "sum of:",
               "details": [
                  {
                     "value": 0.18039167,
                     "description": "max of:",
                     "details": [
                        {
                           "value": 0.026151827,
                           "description": "product of:",
                           "details": [
                              {
                                 "value": 0.052303653,
                                 "description": "sum of:",
                                 "details": [
                                    {
                                       "value": 0.052303653,
                                       "description": "weight(abstract:photomultiplier in 9) [PerFieldSimilarity], result of:",
                                       "details": [
                                          {
                                             "value": 0.052303653,
                                             "description": "score(doc=9,freq=1.0), product of:",
                                             "details": [
                                                {
                                                   "value": 0.08483324,
                                                   "description": "queryWeight, product of:",
                                                   "details": [
                                                      {
                                                         "value": 9.864747,
                                                         "description": "idf(docFreq=2, maxDocs=21234)",
                                                         "details": []
                                                      },
                                                      {
                                                         "value": 0.008599637,
                                                         "description": "queryNorm",
                                                         "details": []
                                                      }
                                                   ]
                                                },
                                                {
                                                   "value": 0.6165467,
                                                   "description": "fieldWeight in 9, product of:",
                                                   "details": [
                                                      {
                                                         "value": 1,
                                                         "description": "tf(freq=1.0), with freq of:",
                                                         "details": [
                                                            {
                                                               "value": 1,
                                                               "description": "termFreq=1.0",
                                                               "details": []
                                                            }
                                                         ]
                                                      },
                                                      {
                                                         "value": 9.864747,
                                                         "description": "idf(docFreq=2, maxDocs=21234)",
                                                         "details": []
                                                      },
                                                      {
                                                         "value": 0.0625,
                                                         "description": "fieldNorm(doc=9)",
                                                         "details": []
                                                      }
                                                   ]
                                                }
                                             ]
                                          }
                                       ]
                                    }
                                 ]
                              },
                              {
                                 "value": 0.5,
                                 "description": "coord(1/2)",
                                 "details": []
                              }
                           ]
                        },
                        {
                           "value": 0.18039167,
                           "description": "product of:",
                           "details": [
                              {
                                 "value": 0.36078334,
                                 "description": "sum of:",
                                 "details": [
                                    {
                                       "value": 0.36078334,
                                       "description": "weight(title:photomultiplier in 9) [PerFieldSimilarity], result of:",
                                       "details": [
                                          {
                                             "value": 0.36078334,
                                             "description": "score(doc=9,freq=2.0), product of:",
                                             "details": [
                                                {
                                                   "value": 0.13248014,
                                                   "description": "queryWeight, product of:",
                                                   "details": [
                                                      {
                                                         "value": 10.270212,
                                                         "description": "idf(docFreq=1, maxDocs=21234)",
                                                         "details": []
                                                      },
                                                      {
                                                         "value": 0.012899456,
                                                         "description": "queryNorm",
                                                         "details": []
                                                      }
                                                   ]
                                                },
                                                {
                                                   "value": 2.7233012,
                                                   "description": "fieldWeight in 9, product of:",
                                                   "details": [
                                                      {
                                                         "value": 1.4142135,
                                                         "description": "tf(freq=2.0), with freq of:",
                                                         "details": [
                                                            {
                                                               "value": 2,
                                                               "description": "termFreq=2.0",
                                                               "details": []
                                                            }
                                                         ]
                                                      },
                                                      {
                                                         "value": 10.270212,
                                                         "description": "idf(docFreq=1, maxDocs=21234)",
                                                         "details": []
                                                      },
                                                      {
                                                         "value": 0.1875,
                                                         "description": "fieldNorm(doc=9)",
                                                         "details": []
                                                      }
                                                   ]
                                                }
                                             ]
                                          }
                                       ]
                                    }
                                 ]
                              },
                              {
                                 "value": 0.5,
                                 "description": "coord(1/2)",
                                 "details": []
                              }
                           ]
                        }
                     ]
                  },
                  {
                     "value": 0,
                     "description": "match on required clause, product of:",
                     "details": [
                        {
                           "value": 0,
                           "description": "# clause",
                           "details": []
                        },
                        {
                           "value": 0.0042998185,
                           "description": "_type:record, product of:",
                           "details": [
                              {
                                 "value": 1,
                                 "description": "boost",
                                 "details": []
                              },
                              {
                                 "value": 0.0042998185,
                                 "description": "queryNorm",
                                 "details": []
                              }
                           ]
                        }
                     ]
                  }
               ]
            }
         },

Looking at "description": "weight(...) we can see which word matched and in which field.

ksachs commented 8 years ago

Don't forget the transparency! Esp. if foo bar baz behave different from key:foo key1:foo1. Btw: I noticed only yesterday that on labs-holdingpen "Need action" type:arXiv uri:astro* uri:physics* is in fact "Need action" and (type:arXiv or uri:astro* or uri:physics*) . It's what you intuitively want, but you might not expect this.

jmartinm commented 6 years ago

This is mostly done. @chris-asl can confirm.

chris-asl commented 6 years ago

The parser part for grouping all the queries together in a flat query is already done. But as mentioned in OP we need to decide which are the important fields to query on for non-keyword queries, i.e. ValueQueries. Currently, we query the _all field, as seen here.