opensearch-project / sql

Query your data using familiar SQL or intuitive Piped Processing Language (PPL)
https://opensearch.org/docs/latest/search-plugins/sql/index/
Apache License 2.0
120 stars 139 forks source link

[BUG] COUNT with aggregation return multiple rows with null value #2716

Open penghuo opened 5 months ago

penghuo commented 5 months ago

What is the bug? curl --request POST \ --url http://localhost:9200/_plugins/_sql?format=jdbc \ --header 'content-type: application/json' \ --data '{"query": "select count(*) from test00001 where status != 200"}'

the result include multiple rows

{
  "schema": [
    {
      "name": "count(*)",
      "type": "integer"
    }
  ],
  "datarows": [
    [
      10
    ],
    [
      null
    ],
    [
      null
    ],
    [
      null
    ],
    [
      null
    ],
    [
      null
    ],
    [
      null
    ],
    [
      null
    ],
    [
      null
    ],
    [
      null
    ]
  ],
  "total": 10,
  "size": 10,
  "

How can one reproduce the bug? Todo

What is the expected behavior? The result only include 1 rows

{
  "schema": [
    {
      "name": "count(*)",
      "type": "integer"
    }
  ],
  "datarows": [
    [
      10
    ]
  ],
  "total": 1,
  "size": 1,
  "

What is your host/environment?

Do you have any screenshots? If applicable, add screenshots to help explain your problem.

Do you have any additional context? Add any other context about the problem.

manasvinibs commented 4 months ago

Not able to repro this issue in main branch for field type Integer on JDBC response format

 curl -X POST "http://localhost:9200/_plugins/_sql?format=jdbc" -H "Content-Type: application/json" -d '{
>   "query": "SELECT count(*) FROM test100 where status != 200"}'
{
  "schema": [
    {
      "name": "count(*)",
      "type": "integer"
    }
  ],
  "datarows": [
    [
      1
    ]
  ],
  "total": 1,
  "size": 1,
  "status": 200

But I see if we use a field type text and use the count(*)with != operator the below error -

curl -X POST "http://localhost:9200/_plugins/_sql?format=jdbc" -H "Content-Type: application/json" -d '{
>   "query": "SELECT count(*) FROM test100 where ImageType != \"machine\""
> }'
{
  "error": {
    "reason": "Error occurred in OpenSearch engine: all shards failed",
    "details": "Shard[0]: java.lang.IllegalArgumentException: Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [ImageType] in order to load field data by uninverting the inverted index. Note that this can use significant memory.\n\nFor more details, please send request for Json format to see the raw response from OpenSearch engine.",
    "type": "SearchPhaseExecutionException"
  },
  "status": 400

In JSON format

 curl -X POST "http://localhost:9200/_plugins/_sql?format=json" -H "Content-Type: application/json" -d '{
>   "query": "SELECT count(*) FROM test100 where ImageType != \"machine\""
> }'
{
  "error": {
    "reason": "Invalid SQL query",
    "details": "Unsupported operation: !=",
    "type": "SqlParseException"
  },
  "status": 400
}

So it appears is users are using the count(*) aggregation query with negation operator on text fields, the error is expected as it is not recommended to use the aggregation on text fields. But for Integer fields, issue cannot be reproduced.

manasvinibs commented 4 months ago

Not able to repro for field type "keyword" on latest branch

curl -X GET "http://localhost:9200/test200/_mapping"
{"test200":{"mappings":{"properties":{"ImageType":{"type":"keyword"}}}}} 
 curl -X POST "http://localhost:9200/_plugins/_sql?format=jdbc" -H "Content-Type: application/json" -d '{
>   "query": "SELECT count(*) FROM test200 where ImageType != \"machine\""}'
{
  "schema": [
    {
      "name": "count(*)",
      "type": "integer"
    }
  ],
  "datarows": [
    [
      1
    ]
  ],
  "total": 1,
  "size": 1,
  "status": 200
}