Open hy00nc opened 1 month ago
do you mean that you want to keep the pod objects until the ttl finishes?
Or do you want to keep them running?
@alculquicondor, thanks for the reply. I want the mpijob resource itself to be deleted after ttl, just like how ttlSecondsAfterFinished works in MPIJob V1. In the current implementation, it remains uncleaned until deleted explicitly, right?
oh, gotcha. I don't know if that's how other Kubeflow APIs work. If they do, we can bring MPIJob back to parity.
oh, gotcha. I don't know if that's how other Kubeflow APIs work. If they do, we can bring MPIJob back to parity.
Indeed, the other Jobs will be removed after ttlSecondsAfterFinished like this:
Would it make sense to extend activeDeadlineSeconds
and backoffLimit
as well? I guess these are also currently limited to launcher, but other kubeflow jobs apply it to the job-level.
Those should be fine just in Job, because the launcher job is what controls the execution. If it finishes as Failed, the rest of the pods would terminate too, IIRC.
Do we have plan to extend
ttlsSecondsAfterFinished
to the MPIJob-level, not just the launcher?