[Bug] RayJob does not shut down the submitter pod properly

Search before asking

[X] I searched the issues and found no similar issues.

KubeRay Component

ray-operator

What happened + What you expected to happen

In some cases of kuberay v1.0.0, especially when RayJob requests a lot of resources and takes a long time (more than half an hour), the task will be completed, but the log output is not completed (no normal success information is output, but the end output of the job can be seen in the dashboard). At this time, RayJob will be stuck there and the submitter pod will not be recycled normally.

The status information returned by kuberay is shown in the figure below img_v3_02ef_97fbe77b-d958-4ebb-929c-31daf282b13g

After I upgraded the version to v1.1.1, not only the submitter pod was not recycled normally, but the head node was also not recycled. The status was shown as Running in the jobDeploymentStatus field, and nothing else changed

Reproduction script

It is easy to reproduce a RayJob that occupies a lot of resources and takes a long time

Anything else

No response

Are you willing to submit a PR?

[ ] Yes I am willing to submit a PR!

ray-project / kuberay