opensearch-project / opensearch-spark

Spark Accelerator framework ; It enables secondary indices to remote data stores.
Apache License 2.0
22 stars 33 forks source link

[BUG] Flint Query cannot read DATE data type #875

Open LantaoJin opened 2 weeks ago

LantaoJin commented 2 weeks ago

What is the bug? The generated TPCH table files have uploaded to S3. For example, create a parquet table with extended S3 location:

USE tpch_sf_1;
CREATE TABLE `lineitem` (`l_orderkey` LONG, `l_partkey` LONG, `l_suppkey` LONG,
`l_linenumber` LONG, `l_quantity` LONG, `l_extendedprice` LONG,
`l_discount` LONG, `l_tax` LONG, `l_returnflag` STRING,
`l_linestatus` STRING, `l_shipdate` DATE, `l_commitdate` DATE, `l_receiptdate` DATE,
`l_shipinstruct` STRING, `l_shipmode` STRING, `l_comment` STRING)
USING parquet
LOCATION 's3://do-not-delete-flint-integ-tests-us-east-1/data/tpch/sf=1/lineitem/';

Query in Flint failed when access the DATE type field:

SELECT l_shipdate FROM myglue_test.tpch_sf_1.lineitem LIMIT 10;

or

source = myglue_test.tpch_sf_1.lineitem | head 10 | fields l_shipdate

In QueryBench, it shows Failed to query status. But no problem to access other datatype fields such as | fields l_returnflag.

The failed query works well in Spark local as follow:

CREATE TABLE `lineitem` (`l_orderkey` LONG, `l_partkey` LONG, `l_suppkey` LONG,
`l_linenumber` LONG, `l_quantity` LONG, `l_extendedprice` LONG,
`l_discount` LONG, `l_tax` LONG, `l_returnflag` STRING,
`l_linestatus` STRING, `l_shipdate` DATE, `l_commitdate` DATE, `l_receiptdate` DATE,
`l_shipinstruct` STRING, `l_shipmode` STRING, `l_comment` STRING)
USING parquet
LOCATION 'file:///Users/ltjin/Downloads/tpch/data/tpch/sf=1/lineitem/';

SELECT l_shipdate FROM lineitem LIMIT 10;

How to reproduce? Below simple query failed in Query Workbench -> SQL

select date('2010-02-11')
noCharger commented 1 week ago
select date('2010-02-11')

result is

          "result": [
            "{'2010-02-11':'2010-02-11'}"
          ],
noCharger commented 1 week ago

on sql plugin it returns 500

"{\n \"status\": 500,\n \"error\": {\n \"type\": \"SemanticCheckException\",\n \"reason\": \"There was internal problem at backend\",\n \"details\": \"timestamp:2010-02-11 in unsupported format, please use \u0027yyyy-MM-dd HH:mm:ss[.SSSSSSSSS]\u0027\"\n }\n}"

noCharger commented 1 week ago

on workbench it is not handled - seems to be a bug https://github.com/opensearch-project/dashboards-query-workbench/blame/70742b8cd3d0f3bbec2373366673703b337a29a3/common/utils/async_query_helpers.ts#L82-L137

noCharger commented 1 week ago
[2024-11-13T20:03:14,930][ERROR][o.o.s.s.r.RestAsyncQueryManagementAction] [51dfc4a70ba6543b484007138c2fd46a] Error happened during request handling
org.opensearch.sql.exception.SemanticCheckException: timestamp:2010-02-11 in unsupported format, please use 'yyyy-MM-dd HH:mm:ss[.SSSSSSSSS]'
        at org.opensearch.sql.data.model.ExprTimestampValue.<init>(ExprTimestampValue.java:39)
        at org.opensearch.sql.spark.functions.response.DefaultSparkSqlFunctionResponseHandle.extractRow(DefaultSparkSqlFunctionResponseHandle.java:93)
        at org.opensearch.sql.spark.functions.response.DefaultSparkSqlFunctionResponseHandle.constructIteratorAndSchema(DefaultSparkSqlFunctionResponseHandle.java:58)
        at org.opensearch.sql.spark.functions.response.DefaultSparkSqlFunctionResponseHandle.<init>(DefaultSparkSqlFunctionResponseHandle.java:47)
        at org.opensearch.sql.spark.asyncquery.AsyncQueryExecutorServiceImpl.getAsyncQueryResults(AsyncQueryExecutorServiceImpl.java:77)
        at org.opensearch.sql.spark.transport.TransportGetAsyncQueryResultAction.doExecute(TransportGetAsyncQueryResultAction.java:55)
        at org.opensearch.sql.spark.transport.TransportGetAsyncQueryResultAction.doExecute(TransportGetAsyncQueryResultAction.java:28)
        at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:218)
        at org.opensearch.indexmanagement.controlcenter.notification.filter.IndexOperationActionFilter.apply(IndexOperationActionFilter.kt:39)
        at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216)
        at org.opensearch.indexmanagement.rollup.actionfilter.FieldCapsFilter.apply(FieldCapsFilter.kt:118)
        at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216)
        at org.opensearch.security.filter.SecurityFilter.apply0(SecurityFilter.java:395)
        at org.opensearch.security.filter.SecurityFilter.apply(SecurityFilter.java:165)
        at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216)
        at org.opensearch.performanceanalyzer.action.PerformanceAnalyzerActionFilter.apply(PerformanceAnalyzerActionFilter.java:78)
        at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216)
        at org.opensearch.action.support.TransportAction.execute(TransportAction.java:188)
        at org.opensearch.action.support.TransportAction.execute(TransportAction.java:107)
        at org.opensearch.client.node.NodeClient.executeLocally(NodeClient.java:110)
        at org.opensearch.client.node.NodeClient.doExecute(NodeClient.java:97)
        at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:476)
        at org.opensearch.sql.spark.rest.RestAsyncQueryManagementAction.lambda$executeGetAsyncQueryResultRequest$3(RestAsyncQueryManagementAction.java:165)
        at org.opensearch.sql.datasources.utils.Scheduler.lambda$withCurrentContext$0(Scheduler.java:30)
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:863)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)
[2024-11-13T20:03:14,930][INFO ][c.a.c.e.logger           ] [51dfc4a70ba6543b484007138c2fd46a] GET /_plugins/_async_query/VHppcjZmcU5CRG15Z2x1ZV90ZXN0 pretty=true 500 INTERNAL_SERVER_ERROR 246 72
noCharger commented 1 week ago

related sql code https://github.com/opensearch-project/sql/blob/main/spark/src/main/java/org/opensearch/sql/spark/functions/response/DefaultSparkSqlFunctionResponseHandle.java#L88-L93