terascope / teraslice

Scalable data processing pipelines in JavaScript
https://terascope.github.io/teraslice/
Apache License 2.0
50 stars 13 forks source link

xlucene query for glob match on ES text field doesn't match #1460

Open godber opened 4 years ago

godber commented 4 years ago

When doing a query like this: query:"NAME:*US" I get no results despite there being matching records.

Hitting ES:

curl -Ss '127.0.0.1:9200/noaa-isd-*/_search?q=NAME:*US' | jq .hits.total
16680288

Hitting a Spaces API

curl -sS  'https://127.0.0.1/api/v2/s1?token=MYAWESOMETOKEN&q=NAME:*US' | jq
{
  "info": "0 results found.",
  "total": 0,
  "returning": 0,
  "results": []
}

There are also no results in QueryPoint.

I may be doing something wrong. We can discuss.

godber commented 4 years ago

One noteworthy thing here. Both Jared and Peter tried to reproduce this issue but couldn't. Speaking with Jared, his mapping had keyword for the NAME field, while as mine is as follows:

          "NAME": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },

My es version is 6.8.1.

godber commented 4 years ago

Strangely a similar query for STATION works:

          "STATION": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },

These three queries return what I'd expect:

q=STATION:010590*
q=STATION:01059099999
q=STATION:*10590*'
godber commented 4 years ago

Note, the examples in this issue are all from a sample NOAA weather station dataset.

https://registry.opendata.aws/noaa-ghcnh/

godber commented 4 years ago

Apparently, if I omit the wildcard, it will match results:

curl -sS  'https://127.0.0.1/api/v2/s1?token=MYAWESOMETOKEN&q=NAME:BANAK' | jq .total
18716

This result set happens to match the result set when querying by STATION ID:

curl -sS  'https://api-noaa.tera1.lan/api/v2/noaa?token=MYAWESOMETOKEN&q=STATION:01059099999' | jq .total
18716
jsnoble commented 4 years ago

Notes I have gathered: https://github.com/elastic/kibana/issues/23001 Wildcard queries part https://www.timroes.de/elasticsearch-kibana-queries-in-depth-tutorial https://discuss.elastic.co/t/wildcard-query-not-working-as-expected/84447/5