opensearch-project / opensearch-spark

Spark Accelerator framework ; It enables secondary indices to remote data stores.
Apache License 2.0
25 stars 33 forks source link

[BUG][SanityTest] EMR-S batch jobs automatically became success if no query submit for a while #735

Open LantaoJin opened 1 month ago

LantaoJin commented 1 month ago

What is the bug? The inner EMR-S batch job will completed as success when we don't trigger a Spark job for a while (seems it has a TTL in ~5 mins), then the next query will re-launch a EMR-S batch job which cause very slow.

Screenshot 2024-10-03 at 12 15 57

How can one reproduce the bug?

  1. Go to Workbench page. For example https://search-flint05-sanity-h5if7yelmxws5hc35cap2lauf4.us-east-1.es-integ.amazonaws.com/_dashboards/app/opensearch-query-workbench#/
  2. Execute multiple same PPL queries work fine. For example, the following query after second run only takes seconds
    source = myglue_test.default.nested | head 10
  3. But when you hold on for a while then execute again. It will trigger a new batch job in EMR-S and very slow.

What is the expected behavior? Increase the TTL if it has. Or identify why the batch job completed so soon.

What is your host/environment?

Do you have any screenshots? If applicable, add screenshots to help explain your problem.

Do you have any additional context? Add any other context about the problem.

dblock commented 1 month ago

[Catch All Triage - 1, 2]

dblock commented 1 month ago

Is this a bug in opensearch-spark or EMR? If it's the latter it doesn't belong here.