[BUG][SanityTest] Spark job failure cannot be caught in Flint

LantaoJin commented 1 month ago

What is the bug? When a Spark job failed, the PPL query will be hanging forever.

How can one reproduce the bug? Steps to reproduce the behavior:

Go to Workbench page. For example https://search-flint05-sanity-h5if7yelmxws5hc35cap2lauf4.us-east-1.es-integ.amazonaws.com/_dashboards/app/opensearch-query-workbench#/
Execute a PPL query which could cause job fail: such as following PPL query.
```
source = myglue_test.default.http_logs | stats avg(size) by clientip
```
The query group by a high cardinality column will trigger a writing failure.
The query will be running forever in workbench

What is the expected behavior? Failed with error message

What is your host/environment?

Do you have any screenshots? If applicable, add screenshots to help explain your problem.

Do you have any additional context? Add any other context about the problem.

LantaoJin commented 1 month ago

@ykmr1224 @noCharger , please help to transit to proper project if it doesn't belong to this project.

dblock commented 1 month ago

[Catch All Triage - 1, 2]

opensearch-project / opensearch-spark