opensearch-project / sql

Query your data using familiar SQL or intuitive Piped Processing Language (PPL)
https://opensearch.org/docs/latest/search-plugins/sql/index/
Apache License 2.0
120 stars 139 forks source link

[BUG] Incorrect values of quotted nested fields aggregation when `fetch_size` is set #2530

Open smortex opened 8 months ago

smortex commented 8 months ago

What is the bug?

Using the SQL API (with the default format jdbc), combining quoted nested fields and the fetch_size attribute cause unexpected failures (on https://playground.opensearch.org) or incorrect value (on my work instance, opensearch 2.12.0).

How can one reproduce the bug?

On https://playground.opensearch.org

We can use the opensearch_dashboards_sample_data_ecommerce index to show the issue. Run the following 4 commands in the "Dev Tools" console:

POST /_plugins/_sql
{
  "query": "select avg(`products.quantity`) from opensearch_dashboards_sample_data_ecommerce;"
}

POST /_plugins/_sql
{
  "query": "select avg(`products.quantity`) from opensearch_dashboards_sample_data_ecommerce;",
  "fetch_size": 1000
}

POST /_plugins/_sql
{
  "query": "select avg(products.quantity) from opensearch_dashboards_sample_data_ecommerce;"
}

POST /_plugins/_sql
{
  "query": "select avg(products.quantity) from opensearch_dashboards_sample_data_ecommerce;",
  "fetch_size": 1000
}

The command are basically identical and expected to return the same content: command 1 & 2 quote products.quantity between backticks, and command 2 & 4 explicitly set fetch_size to 1000 (the default value).

However, while command 1, 3 and 4 produce the same and expected result (only schema.0.name change as expected):

{
  "schema": [
    {
      "name": "avg(`products.quantity`)",
      "type": "double"
    }
  ],
  "datarows": [
    [
      1.0003965500148706
    ]
  ],
  "total": 1,
  "size": 1,
  "status": 200
}

command 2 fail unexpectedly:

{
  "error": {
    "reason": "There was internal problem at backend",
    "details": "org.json.JSONException: JSON does not allow non-finite numbers.",
    "type": "RuntimeException"
  },
  "status": 500
}

On my work system

The above was obtained when trying to reproduce the issue I see on my workstation. There I have an index data1 with a nested field and a few documents:

POST _plugins/_sql/
{
  "query": "SELECT sum(`field_nested.c2`) FROM data1"
}

POST _plugins/_sql/
{
  "query": "SELECT sum(`field_nested.c2`) FROM data1",
  "fetch_size": 1000
}

POST _plugins/_sql/
{
  "query": "SELECT sum(field_nested.c2) FROM data1"
}

POST _plugins/_sql/
{
  "query": "SELECT sum(field_nested.c2) FROM data1",
  "fetch_size": 1000
}

queries 1, 3 and 4 produce the expected result:

{
  "schema": [
    {
      "name": "SUM(field_nested.c2)",
      "type": "double"
    }
  ],
  "total": 1,
  "datarows": [
    [
      6
    ]
  ],
  "size": 1,
  "status": 200
}

but the 2nd query produce an incorrect result:

{
  "schema": [
    {
      "name": "SUM(`field_nested.c2`)",
      "type": "double"
    }
  ],
  "total": 1,
  "datarows": [
    [
      0
    ]
  ],
  "size": 1,
  "status": 200
}

The query return 0 instead of the expected 6.

What is the expected behavior?

All these queries are supposed to be equivalent, so they should return the same value.

What is your host/environment?

Do you have any additional context?

I am not sure if the different behavior is due to the dataset or if this should be 2 distinct bugs.

Swiddis commented 7 months ago

Thanks for the issue!