opensearch-project / observability

Visualize and explore your logs, traces and metrics data in OpenSearch Dashboards
https://opensearch.org/docs/latest/observability-plugin/index/
Apache License 2.0
52 stars 97 forks source link

[BUG] 500 when retrieving direct query results from OpenSearch index results in query hanging in "Running" state #1854

Open engechas opened 1 month ago

engechas commented 1 month ago

What is the bug? This is a 2-part bug is related to direct query.

Part 1 When the SQL plugin frontend receives a 500 retrieving the query results from the local index, the Log Explorer page gets stuck in the “Running” state indefinitely.

Part 2 When opensearch-spark writes data with duplicate column names to OpenSearch, an exception is thrown when trying to parse the data. The exception mentions duplicate key names as the cause.

[2024-07-17T18:53:04,463][ERROR][o.o.s.s.r.RestAsyncQueryManagementAction] [7817f72593045b67563a3bef3abef663] Error happened during request handling
org.json.JSONException: Duplicate key "ip" at 176 [character 177 line 1]
        at org.json.JSONTokener.syntaxError(JSONTokener.java:503)
        at org.json.JSONObject.<init>(JSONObject.java:234)
        at org.json.JSONObject.<init>(JSONObject.java:402)
        at org.opensearch.sql.spark.functions.response.DefaultSparkSqlFunctionResponseHandle.constructIteratorAndSchema(DefaultSparkSqlFunctionResponseHandle.java:57)
        at org.opensearch.sql.spark.functions.response.DefaultSparkSqlFunctionResponseHandle.<init>(DefaultSparkSqlFunctionResponseHandle.java:47)
        at org.opensearch.sql.spark.asyncquery.AsyncQueryExecutorServiceImpl.getAsyncQueryResults(AsyncQueryExecutorServiceImpl.java:77)
        at org.opensearch.sql.spark.transport.TransportGetAsyncQueryResultAction.doExecute(TransportGetAsyncQueryResultAction.java:55)
        at org.opensearch.sql.spark.transport.TransportGetAsyncQueryResultAction.doExecute(TransportGetAsyncQueryResultAction.java:28)
        at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:218)
        at org.opensearch.indexmanagement.controlcenter.notification.filter.IndexOperationActionFilter.apply(IndexOperationActionFilter.kt:39)
        at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216)
        at org.opensearch.indexmanagement.rollup.actionfilter.FieldCapsFilter.apply(FieldCapsFilter.kt:118)
        at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216)
        at org.opensearch.security.filter.SecurityFilter.apply0(SecurityFilter.java:395)
        at org.opensearch.security.filter.SecurityFilter.apply(SecurityFilter.java:165)
        at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216)
        at org.opensearch.performanceanalyzer.action.PerformanceAnalyzerActionFilter.apply(PerformanceAnalyzerActionFilter.java:78)
        at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216)
        at org.opensearch.action.support.TransportAction.execute(TransportAction.java:188)
        at org.opensearch.action.support.TransportAction.execute(TransportAction.java:107)
        at org.opensearch.client.node.NodeClient.executeLocally(NodeClient.java:110)
        at org.opensearch.client.node.NodeClient.doExecute(NodeClient.java:97)
        at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:476)
        at org.opensearch.sql.spark.rest.RestAsyncQueryManagementAction.lambda$executeGetAsyncQueryResultRequest$3(RestAsyncQueryManagementAction.java:165)
        at org.opensearch.sql.datasources.utils.Scheduler.lambda$withCurrentContext$0(Scheduler.java:30)
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:863)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)

How can one reproduce the bug? Steps to reproduce the behavior:

  1. Create a Glue table with two nested columns that have leaf fields with the same name Example:
    {
    "nested": {
    "field": "value"
    },
    "other_nested": {
    "field": "value"
    }
    }
  2. Query the table with the below query
    SELECT nested.field, other_nested.field FROM <table>
  3. Check the network tab of the browser dev tools to confirm the response from /_dashboards/api/observability/query/jobs/<job id> has a 500 status
  4. Verify the frontend still shows the query as running

What is the expected behavior?

What is your host/environment?

Do you have any screenshots? If applicable, add screenshots to help explain your problem.

Do you have any additional context? Add any other context about the problem.

dblock commented 3 weeks ago

[Catch All Triage - 1, 2, 3, 4, 5]