kubeflow / spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Apache License 2.0
2.8k stars 1.38k forks source link

Robustness to driver pod taking time to create #2315

Open Tom-Newton opened 2 weeks ago

Tom-Newton commented 2 weeks ago

Purpose of this PR

Improve reliability. Closes: #2302

Proposed changes:

Change Category

Checklist

Additional Notes

Some logs of a real example on our prod cluster where the spark application was saved by the grace period added in this PR. Explore-logs-2024-11-11 12_09_13.txt

google-oss-prow[bot] commented 2 weeks ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign chenyi015 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/kubeflow/spark-operator/blob/master/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
Tom-Newton commented 3 days ago

Sorry for the direct ping. @ChenYi015 are you the right person to review this?