opensearch-project / opensearch-spark

Spark Accelerator framework ; It enables secondary indices to remote data stores.
Apache License 2.0
22 stars 33 forks source link

[BUG][SanityTest] Spark job failure cannot be caught in Flint #733

Open LantaoJin opened 1 month ago

LantaoJin commented 1 month ago

What is the bug? When a Spark job failed, the PPL query will be hanging forever.

How can one reproduce the bug? Steps to reproduce the behavior:

  1. Go to Workbench page. For example https://search-flint05-sanity-h5if7yelmxws5hc35cap2lauf4.us-east-1.es-integ.amazonaws.com/_dashboards/app/opensearch-query-workbench#/

  2. Execute a PPL query which could cause job fail: such as following PPL query.

    source = myglue_test.default.http_logs | stats avg(size) by clientip

    The query group by a high cardinality column will trigger a writing failure.

    Screenshot 2024-10-01 at 17 33 09
  3. The query will be running forever in workbench

    Screenshot 2024-10-03 at 11 41 57

What is the expected behavior? Failed with error message

What is your host/environment?

Do you have any screenshots? If applicable, add screenshots to help explain your problem.

Do you have any additional context? Add any other context about the problem.

LantaoJin commented 1 month ago

@ykmr1224 @noCharger , please help to transit to proper project if it doesn't belong to this project.

dblock commented 1 month ago

[Catch All Triage - 1, 2]