opensearch-project / sql

Query your data using familiar SQL or intuitive Piped Processing Language (PPL)
https://opensearch.org/docs/latest/search-plugins/sql/index/
Apache License 2.0
121 stars 140 forks source link

[BUG] Malformed SQL query with "SELECT *FROM ... WHERE ..." doesn't cause error #3124

Open seankao-az opened 3 weeks ago

seankao-az commented 3 weeks ago

What is the bug? Instead of using the proper SQL syntax SELECT * FROM ... WHERE ..., when running SELECT *FROM ... WHERE ... (notice missing space between * and FROM), query still executes unexpectedly, and behaves differently from the proper syntax.

Verified same behavior can be observed in https://opensearch.org/versions/opensearch-1-0-0.html, all the way back to the 1.0 version, and in latest version.

Below are some queries on some sample index opensearch_dashboards_sample_data_ecommerce

# Query (1)
POST /_plugins/_sql
{
    "query": "select *from opensearch_dashboards_sample_data_ecommerce where manufacturer in (Elitelligence, Primemaster) limit 10"
}

# Unexpected. Returns 10 rows
{
    "schema": [
        (abbreviated)
    ],
    "datarows": [
        (abbreviated)
    ],
    "total": 10,
    "size": 10,
    "status": 200
}

This result is unexpected.

Notice that if we modify above query (1) into SELECT * FROM ..., then we get an error:

# Query (2)
POST /_plugins/_sql
{
    "query": "select * from opensearch_dashboards_sample_data_ecommerce where manufacturer in (Elitelligence, Primemaster) limit 10"
}

# Expected. Error.
{
    "error": {
        "reason": "Invalid SQL query",
        "details": "can't resolve Symbol(namespace=FIELD_NAME, name=Elitelligence) in type env",
        "type": "SemanticCheckException"
    },
    "status": 400
}

The correct syntax should be the following, with literals quoted

# Query (3)
POST /_plugins/_sql
{
    "query": "select * from opensearch_dashboards_sample_data_ecommerce where manufacturer in (\"Elitelligence\", \"Primemaster\") limit 10"
}

# Expected. Returns 10 rows.
{
    "schema": [
        (abbreviated)
    ],
    "datarows": [
        (abbreviated)
    ],
    "total": 10,
    "size": 10,
    "status": 200
}

Without where clause

Notice, however, that without the where clause, this query errors out:

# Query (4.1)
POST /_plugins/_sql
{
    "query": "select *from opensearch_dashboards_sample_data_ecommerce limit 10"
}

# Expected. Error. However different result from query (1)
{
  "error": {
    "reason": "Invalid SQL query",
    "details": "can't resolve Symbol(namespace=FIELD_NAME, name=*from) in type env",
    "type": "SemanticCheckException"
  },
  "status": 400
}

with json format

Even more bizarre, same malformed query without where clause but querying with json format returns without error

# Query (4.2)
POST /_plugins/_sql?format=json
{
    "query": "select *from opensearch_dashboards_sample_data_ecommerce limit 11"
}

# Unexpected. Returns 11 docs, but hits.total value seems incorrect
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4675,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      { ... },
      { ... },
      { ... },
      { ... },
      { ... },
      { ... },
      { ... },
      { ... },
      { ... },
      { ... },
      { ... },
    ]
  }
}

How can one reproduce the bug? Steps to reproduce the behavior:

  1. Go to "Query Workbench"
  2. Run the example queries

Note that in all the released version I tested, the issue can be observed. However, when tested this on https://playground.opensearch.org/app/home#/ as well, the behavior in playground is slightly different. In query workbench, the malformed query (1) doesn't do anything. It doesn't return result, and also doesn't error out, unlike other malformed query which yields "Opensearch_dashboards_sample_data_ecommerce: Bad Request, this query is not runnable."

What is the expected behavior? SELECT *FROM ... WHERE ... is a malformed query and should've resulted in parsing error and not be executed.

What is your host/environment?

Do you have any screenshots? If applicable, add screenshots to help explain your problem.

Do you have any additional context? Explaining SELECT * FROM ... WHERE ... and SELECT *FROM ... WHERE ... yields different result. Not all queries in the example are explainable, though.

# Explaining query (3) `select * from opensearch_dashboards_sample_data_ecommerce where manufacturer in ("Elitelligence", "Primemaster") limit 10`
{
  "root": {
    "name": "ProjectOperator",
    "description": {
      "fields": "[type, manufacturer, products, customer_last_name, day_of_week_i, total_quantity, currency, taxless_total_price, total_unique_products, event, sku, email, day_of_week, geoip, customer_first_name, customer_phone, customer_birth_date, customer_full_name, order_date, category, customer_id, order_id, user, customer_gender, taxful_total_price]"
    },
    "children": [
      {
        "name": "OpenSearchIndexScan",
        "description": {
          "request": "OpenSearchQueryRequest(indexName=opensearch_dashboards_sample_data_ecommerce, sourceBuilder={\"from\":0,\"size\":10,\"timeout\":\"1m\",\"query\":{\"bool\":{\"should\":[{\"term\":{\"manufacturer.keyword\":{\"value\":\"Elitelligence\",\"boost\":1.0}}},{\"term\":{\"manufacturer.keyword\":{\"value\":\"Primemaster\",\"boost\":1.0}}}],\"adjust_pure_negative\":true,\"boost\":1.0}},\"_source\":{\"includes\":[\"email\",\"sku\",\"customer_last_name\",\"customer_id\",\"total_quantity\",\"category\",\"taxful_total_price\",\"customer_first_name\",\"currency\",\"order_date\",\"day_of_week_i\",\"type\",\"customer_phone\",\"taxless_total_price\",\"total_unique_products\",\"day_of_week\",\"customer_gender\",\"order_id\",\"customer_full_name\",\"manufacturer\",\"products\",\"customer_birth_date\",\"user\",\"geoip\",\"event\"],\"excludes\":[]},\"sort\":[{\"_doc\":{\"order\":\"asc\"}}]}, searchDone=false)"
        },
        "children": []
      }
    ]
  }
}

# Explaining `select *from opensearch_dashboards_sample_data_ecommerce where manufacturer in ("Elitelligence", "Primemaster") limit 10`
# Note that this is not query (1). The literals are with quotes. Somehow query (1) can be executed but cannot be explained
{
  "from": 0,
  "size": 10,
  "query": {
    "bool": {
      "filter": [
        {
          "bool": {
            "must": [
              {
                "bool": {
                  "should": [
                    {
                      "term": {
                        "manufacturer.keyword": {
                          "value": "Elitelligence",
                          "boost": 1
                        }
                      }
                    },
                    {
                      "term": {
                        "manufacturer.keyword": {
                          "value": "Primemaster",
                          "boost": 1
                        }
                      }
                    }
                  ],
                  "adjust_pure_negative": true,
                  "boost": 1
                }
              }
            ],
            "adjust_pure_negative": true,
            "boost": 1
          }
        }
      ],
      "adjust_pure_negative": true,
      "boost": 1
    }
  }
}