opensearch-project / sql

Query your data using familiar SQL or intuitive Piped Processing Language (PPL)
https://opensearch.org/docs/latest/search-plugins/sql/index/
Apache License 2.0
121 stars 140 forks source link

[BUG] Async query API fails to handle date fields in query results #3122

Open dai-chen opened 1 month ago

dai-chen commented 1 month ago

What is the bug?

The async query API cannot support date fields in query results. This issue arises probably because it attempts to convert each field value to internal ExprValue representation, which has limited support for different date formats.

How can one reproduce the bug?

Query result in index:
    "_source": {
          "result": [
            "{'key':100001,'status':1,'size':15.5,'agent':'AgentA','timestamp':'2024-09-24'}",
            "{'key':100002,'status':2,'size':30.7,'agent':'AgentB','timestamp':'2024-09-23'}",
            "{'key':100003,'status':3,'size':45.9,'agent':'AgentC','timestamp':'2024-09-22'}",
            "{'key':100004,'status':1,'size':25.2,'agent':'AgentD','timestamp':'2024-09-21'}",
            "{'key':100005,'status':4,'size':55.8,'agent':'AgentE','timestamp':'2024-09-20'}"
          ],
          "schema": [
            "{'column_name':'key','data_type':'long'}",
            "{'column_name':'status','data_type':'integer'}",
            "{'column_name':'size','data_type':'float'}",
            "{'column_name':'agent','data_type':'string'}",
            "{'column_name':'timestamp','data_type':'date'}"
          ],

GET _plugins/_async_query/blFQWH...==

"{\n  \"status\": 500,\n  \"error\": {\n    \"type\": \"SemanticCheckException\",\n    
\"reason\": \"There was internal problem at backend\",\n    
\"details\": \"timestamp:2024-09-24 in unsupported format, please use 
\\u0027yyyy-MM-dd HH:mm:ss[.SSSSSSSSS]\\u0027\"\n  }\n}"

What is the expected behavior?

The async query API should correctly handle date fields, enabling users to retrieve query results in the expected format without errors.

Do you have any additional context?

A possible solution is to bypass conversion to ExprValue, as async query results are directly returned to users without further processing in the engine. In this case, ExprValue may be redundant. Instead, returning the raw results as provided by Spark could help avoid similar issues in the future. Code: https://github.com/opensearch-project/sql/blob/main/spark/src/main/java/org/opensearch/sql/spark/functions/response/DefaultSparkSqlFunctionResponseHandle.java#L65