Closed leileiwan closed 4 years ago
SGTM. But why did the CI pass without the change?
/cc @johnugeorge
/kind bug
@gaocegege
Because in this test case, we set the service for worker pod. Then it will run the function "deletePodsAndServices " which will delete services of all pods with function "DeleteService"(FakeServiceControl). In the "FakeServiceControl ",we delete a service just append it ,so the test case can pass coincidentally.
Even the CI can pass this test case, but it‘s confusing. I suggest to change it.
(1) not create service for worker pod
(2) delete service just append master service
(3) expect deleted service equal with len(master pods)
func TestDeletePodsAndServices(t *testing.T) {
...
testutil.SetServices(serviceIndexer, tc.job, testutil.LabelWorker, tc.activeWorkerServices, t)
testutil.SetServices(serviceIndexer, tc.job, testutil.LabelMaster, tc.activeMasterServices, t)
...
if len(fakeServiceControl.DeleteServiceName) != tc.expectedPodDeletions {
t.Errorf("%s: unexpected number of service deletes. Expected %d, saw %d\n",
tc.description, tc.expectedPodDeletions, len(fakeServiceControl.DeleteServiceName))
}
...
}
func (pc *PyTorchController) deletePodsAndServices(job *pyv1.PyTorchJob, pods []*v1.Pod) error {
if len(pods) == 0 {
return nil
}
// Delete nothing when the cleanPodPolicy is None or Running.
if *job.Spec.CleanPodPolicy == common.CleanPodPolicyNone ||
*job.Spec.CleanPodPolicy == common.CleanPodPolicyRunning {
return nil
}
for _, pod := range pods {
if err := pc.PodControl.DeletePod(pod.Namespace, pod.Name, job); err != nil {
return err
}
// Pod and service have the same name, thus the service could be deleted using pod's name.
if err := pc.ServiceControl.DeleteService(pod.Namespace, pod.Name, job); err != nil {
return err
}
}
return nil
}
func (f *FakeServiceControl) DeleteService(namespace string, serviceID string, object runtime.Object) error {
f.Lock()
defer f.Unlock()
f.DeleteServiceName = append(f.DeleteServiceName, serviceID)
if f.Err != nil {
return f.Err
}
return nil
}
Gotcha. Make sense.
@leileiwan Would you like to help us with this issue? Or I can fix it.
@gaocegege Thanks for you give me a chance to fix it,I will try it.
@leileiwan
Do not be so polite.
Thanks for your contribution! :tada: :+1:
Great. Thanks @leileiwan for your contribution
hi, In the file "pkg/controller.v1/pytorch/job_test.go", the test case "TestDeletePodsAndServices " may have something error,
Because,now only master pod have service,so I belive that the number of service deletes should be len(masterPods),not include the number of worker pods.