[BUG][SanityTest] EMR-S batch jobs automatically became success if no query submit for a while

LantaoJin commented 1 month ago

What is the bug? The inner EMR-S batch job will completed as success when we don't trigger a Spark job for a while (seems it has a TTL in ~5 mins), then the next query will re-launch a EMR-S batch job which cause very slow.

How can one reproduce the bug?

Go to Workbench page. For example https://search-flint05-sanity-h5if7yelmxws5hc35cap2lauf4.us-east-1.es-integ.amazonaws.com/_dashboards/app/opensearch-query-workbench#/
Execute multiple same PPL queries work fine. For example, the following query after second run only takes seconds
```
source = myglue_test.default.nested | head 10
```
But when you hold on for a while then execute again. It will trigger a new batch job in EMR-S and very slow.

What is the expected behavior? Increase the TTL if it has. Or identify why the batch job completed so soon.

What is your host/environment?

OS: [e.g. iOS]
Version [e.g. 22]
Plugins

Do you have any screenshots? If applicable, add screenshots to help explain your problem.

Do you have any additional context? Add any other context about the problem.

dblock commented 1 month ago

[Catch All Triage - 1, 2]

dblock commented 1 month ago

Is this a bug in opensearch-spark or EMR? If it's the latter it doesn't belong here.

opensearch-project / opensearch-spark

[BUG][SanityTest] EMR-S batch jobs automatically became success if no query submit for a while #735