opensearch-project / sql

Query your data using familiar SQL or intuitive Piped Processing Language (PPL)
https://opensearch.org/docs/latest/search-plugins/sql/index/
Apache License 2.0
115 stars 134 forks source link

[BUG] PPL query with head and sort can not properly rewrite as DSL. #494

Open penghuo opened 2 years ago

penghuo commented 2 years ago

Describe the bug PPL query with head and sort can not properly rewrite as DSL.

To Reproduce

POST /_plugins/_ppl/_explain
{
  "query": "source=test_0002 | head 1000 | sort - abletter "
}

{
  "root": {
    "name": "ProjectOperator",
    "description": {
      "fields": "[abletter, 11number]"
    },
    "children": [
      {
        "name": "OpenSearchIndexScan",
        "description": {
          "request": """OpenSearchQueryRequest(indexName=test_0002, sourceBuilder={"from":0,"size":200,"timeout":"1m","_source":{"includes":["abletter","11number"],"excludes":[]},"sort":[{"abletter":{"order":"desc","missing":"_last"}}]}, searchDone=false)"""
        },
        "children": []
      }
    ]
  }
}

Expected behavior size field in DSL should be 1000 instead of 200.

Plugins Please list all plugins currently enabled.

Screenshots If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

Additional context Add any other context about the problem here.

ylwu-amzn commented 2 years ago

Put head after sort works.

POST _plugins/_ppl/_explain
{
  "query": "source=fourclass_data | sort - anomaly_type | head 10000"
}

{
  "root": {
    "name": "ProjectOperator",
    "description": {
      "fields": "[anomaly_type, A, B]"
    },
    "children": [
      {
        "name": "OpenSearchIndexScan",
        "description": {
          "request": """OpenSearchQueryRequest(indexName=fourclass_data, sourceBuilder={"from":0,"size":10000,"timeout":"1m","_source":{"includes":["A","B","anomaly_type"],"excludes":[]},"sort":[{"anomaly_type":{"order":"desc","missing":"_last"}}]}, searchDone=false)"""
        },
        "children": []
      }
    ]
  }
}

It will be good if you can also support head after fields command.

POST _plugins/_ppl/_explain
{  
  "query": "source=nyc_taxi | fields value, timestamp | head 1000  "
}

{
  "root": {
    "name": "ProjectOperator",
    "description": {
      "fields": "[value, timestamp]"
    },
    "children": [
      {
        "name": "LimitOperator",
        "description": {
          "limit": 1000,
          "offset": 0
        },
        "children": [
          {
            "name": "ProjectOperator",
            "description": {
              "fields": "[value, timestamp]"
            },
            "children": [
              {
                "name": "OpenSearchIndexScan",
                "description": {
                  "request": """OpenSearchQueryRequest(indexName=nyc_taxi, sourceBuilder={"from":0,"size":200,"timeout":"1m","_source":{"includes":["value","timestamp"],"excludes":[]}}, searchDone=false)"""
                },
                "children": []
              }
            ]
          }
        ]
      }
    ]
  }
}
Yury-Fridlyand commented 1 year ago

I can't reproduce is by swapping head and sort commands in query:

source=online | sort - all_client | head 1000 | fields all_client 

But if I move fields ahead everything got messed up:

source=online | fields all_client | sort - all_client | head 1000
{
    "root": {
        "name": "ProjectOperator",
        "description": {
            "fields": "[all_client]"
        },
        "children": [
            {
                "name": "LimitOperator",
                "description": {
                    "limit": 1000,
                    "offset": 0
                },
                "children": [
                    {
                        "name": "SortOperator",
                        "description": {
                            "sortList": {
                                "all_client": {
                                    "sortOrder": "DESC",
                                    "nullOrder": "NULL_LAST"
                                }
                            }
                        },
                        "children": [
                            {
                                "name": "ProjectOperator",
                                "description": {
                                    "fields": "[all_client]"
                                },
                                "children": [
                                    {
                                        "name": "OpenSearchIndexScan",
                                        "description": {
                                            "request": "OpenSearchQueryRequest(indexName=online, sourceBuilder={\"from\":0,\"size\":200,\"timeout\":\"1m\",\"_source\":{\"includes\":[\"all_client\"],\"excludes\":[]}}, searchDone=false)"
                                        },
                                        "children": []
                                    }
                                ]
                            }
                        ]
                    }
                ]
            }
        ]
    }
}

I think optimizer rework announced in #1752 should fix this as well.