openshift / origin

Conformance test suite for OpenShift
http://www.openshift.org
Apache License 2.0
8.49k stars 4.7k forks source link

TestSimpleImageChangeBuildTrigger is racy now #1443

Closed smarterclayton closed 9 years ago

smarterclayton commented 9 years ago
failed  TestSimpleImageChangeBuildTrigger
W0324 18:09:02.936033   51103 master.go:186] Portal net unspecified. Defaulting to 10.0.0.0/24.
I0324 18:09:02.936121   51103 util.go:410] Choosing interface en0 for from-host portals
I0324 18:09:02.936165   51103 util.go:415] Interface en0 = fe80::7a31:c1ff:feb7:6042/64
I0324 18:09:02.936181   51103 master.go:214] Will report fe80::7a31:c1ff:feb7:6042 as public IP address.
I0324 18:09:02.936228   51103 master.go:260] Setting master service IPs based on PortalNet subnet to "10.0.0.1" (read-only) and "10.0.0.2" (read-write).
I0324 18:09:02.980026   51103 controller.go:101] Updated all configs for trigger on imageRepo integration/test-image-trigger-repo
I0324 18:09:02.982101   51103 image_change_controller.go:39] Build image change controller detected imagerepo change registry:8080/openshift/test-image-trigger
I0324 18:09:02.982110   51103 image_change_controller.go:83] Running build for buildConfig test-build-cfg in namespace integration
I0324 18:09:02.982899   51103 generate.go:39] Substituting registry:8080/openshift/test-image-trigger:latest for registry:8080/openshift/test-image-trigger
I0324 18:09:02.991200   51103 controller.go:42] Handling build test-build-cfg-61617073-d272-11e4-ba17-7831c1b76042
--- FAIL: TestSimpleImageChangeBuildTrigger (0.08s)
    deploy_trigger_test.go:327: Starting test openshift
    imagechange_buildtrigger_test.go:72: expected watch event type MODIFIED, got ADDED
FAIL
failed  TestSimpleImageChangeTrigger
W0324 18:09:03.100113   51115 master.go:186] Portal net unspecified. Defaulting to 10.0.0.0/24.
I0324 18:09:03.100261   51115 util.go:410] Choosing interface en0 for from-host portals
I0324 18:09:03.100283   51115 util.go:415] Interface en0 = fe80::7a31:c1ff:feb7:6042/64
I0324 18:09:03.100293   51115 master.go:214] Will report fe80::7a31:c1ff:feb7:6042 as public IP address.
I0324 18:09:03.100311   51115 master.go:260] Setting master service IPs based on PortalNet subnet to "10.0.0.1" (read-only) and "10.0.0.2" (read-write).
I0324 18:09:03.139732   51115 image_change_controller.go:39] Build image change controller detected imagerepo change registry:8080/openshift/test-image
I0324 18:09:03.151530   51115 controller.go:43] Ignoring config integration/image-deploy-config:0; no change triggers detected
I0324 18:09:03.152474   51115 controller.go:38] Detecting changed images for deploymentConfig integration/image-deploy-config:0
I0324 18:09:03.152485   51115 controller.go:48] Found matching ImageChange trigger for deploymentConfig integration/image-deploy-config:0: &api.DeploymentTriggerImageChangeParams{Automatic:true, ContainerNames:[]string{"container-1"}, RepositoryName:"registry:8080/openshift/test-image", From:api.ObjectReference{Kind:"", Namespace:"", Name:"", UID:"", APIVersion:"", ResourceVersion:"", FieldPath:""}, Tag:"latest"}
I0324 18:09:03.152534   51115 controller.go:54] Processing image triggers for deploymentConfig integration/image-deploy-config:0
I0324 18:09:03.152547   51115 controller.go:79] Container container-1 for config integration/image-deploy-config:0: image id changed from "ref-1" to ""; regenerating config
I0324 18:09:03.152664   51115 controller.go:40] Waiting for first version of integration/image-deploy-config:0
I0324 18:09:03.176931   51115 controller.go:43] Ignoring config integration/image-deploy-config:1; no change triggers detected
E0324 18:09:03.188003   51115 factory.go:77] fatal error handling imageRepository: couldn't update some deploymentConfigs for trigger on imageRepo integration/test-image-repo
I0324 18:09:03.197184   51115 controller.go:64] Created deployment for config integration/image-deploy-config:1
I0324 18:09:03.210321   51115 image_change_controller.go:39] Build image change controller detected imagerepo change registry:8080/openshift/test-image
I0324 18:09:03.212128   51115 controller.go:38] Detecting changed images for deploymentConfig integration/image-deploy-config:1
I0324 18:09:03.212136   51115 controller.go:48] Found matching ImageChange trigger for deploymentConfig integration/image-deploy-config:1: &api.DeploymentTriggerImageChangeParams{Automatic:true, ContainerNames:[]string{"container-1"}, RepositoryName:"registry:8080/openshift/test-image", From:api.ObjectReference{Kind:"", Namespace:"", Name:"", UID:"", APIVersion:"", ResourceVersion:"", FieldPath:""}, Tag:"latest"}
I0324 18:09:03.212168   51115 controller.go:54] Processing image triggers for deploymentConfig integration/image-deploy-config:1
I0324 18:09:03.212179   51115 controller.go:79] Container container-1 for config integration/image-deploy-config:1: image id changed from "ref-1" to ""; regenerating config
I0324 18:09:03.257557   51115 controller.go:43] Ignoring config integration/image-deploy-config:2; no change triggers detected
I0324 18:09:03.277675   51115 controller.go:94] Updated deploymentConfig integration/image-deploy-config:1 in response to image change trigger
I0324 18:09:03.277686   51115 controller.go:101] Updated all configs for trigger on imageRepo integration/test-image-repo
--- FAIL: TestSimpleImageChangeTrigger (0.22s)
    deploy_trigger_test.go:327: Starting test openshift
    deploy_trigger_test.go:160: expected watch event type ADDED, got MODIFIED
FAIL

It looks the order of the replication controller resize is out of order with the creates. Tests need to be more resilient. @bparees @ironcladlou

bparees commented 9 years ago

@smarterclayton that's definitely not the issue for the imagechange_buildtrigger_test, can't speak to the deploy_trigger_test.

for imagechange_buildtrigger_test, we create a build, watch for an ADD event, and then watch for a MODIFIED event... we're successfully getting the ADD event, so i can't imagine why we're getting a second ADD event instead of a MODIFIED event...that would mean a 2nd build is being created, unless events are being replayed for some reason?

bparees commented 9 years ago

@smarterclayton also does this issue mean that https://github.com/openshift/origin/pull/1432 is absolved from the blame for this race?

smarterclayton commented 9 years ago

Yes

On Mar 24, 2015, at 6:52 PM, Ben Parees notifications@github.com wrote:

@smarterclayton also does this issue mean that #1432 is absolved from the blame for this race?

— Reply to this email directly or view it on GitHub.

smarterclayton commented 9 years ago

It probably means someone is modifying the build in quick succession. It's possible our code was too slow before to trigger the race.

On Mar 24, 2015, at 6:40 PM, Ben Parees notifications@github.com wrote:

@smarterclayton that's definitely not the issue for the imagechange_buildtrigger_test, can't speak to the deploy_trigger_test.

for imagechange_buildtrigger_test, we create a build, watch for an ADD event, and then watch for a MODIFIED event... we're successfully getting the ADD event, so i can't imagine why we're getting a second ADD event instead of a MODIFIED event...that would mean a 2nd build is being created, unless events are being replayed for some reason?

— Reply to this email directly or view it on GitHub.

bparees commented 9 years ago

why would we get an ADD event if it was someone modifying the build?

it's absolutely the case that we are now ADDing and then immmediately MODIFYing the build, this is in part because the buildcontroller didnt' used to be running in this test (as @ironcladlou has noted), but that would result in ADD+MODIFIED, not ADD+ADD which is what the test case is seeing.

smarterclayton commented 9 years ago

Add add means two builds are created

On Mar 24, 2015, at 7:08 PM, Ben Parees notifications@github.com wrote:

why would we get an ADD event if it was someone modifying the build?

it's absolutely the case that we are now ADDing and then immmediately MODIFYing the build, this is in part because the buildcontroller didnt' used to be running in this test (as @ironcladlou has noted), but that would result in ADD+MODIFIED, not ADD+ADD which is what the test case is seeing.

— Reply to this email directly or view it on GitHub.

bparees commented 9 years ago

yes.... that's what i said " so i can't imagine why we're getting a second ADD event instead of a MODIFIED event...that would mean a 2nd build is being created, unless events are being replayed for some reason?"

bparees commented 9 years ago

only way to find out is to dump the two build events and see what's in them. if you can easily recreate it, can you do that? i have not recreated this.

smarterclayton commented 9 years ago
Rufailed  TestSimpleImageChangeBuildTrigger
W0324 19:22:47.983504   70977 master.go:186] Portal net unspecified. Defaulting to 10.0.0.0/24.
I0324 19:22:47.983639   70977 util.go:410] Choosing interface en0 for from-host portals
I0324 19:22:47.983657   70977 util.go:415] Interface en0 = fe80::7a31:c1ff:feb7:6042/64
I0324 19:22:47.983668   70977 master.go:214] Will report fe80::7a31:c1ff:feb7:6042 as public IP address.
I0324 19:22:47.983690   70977 master.go:260] Setting master service IPs based on PortalNet subnet to "10.0.0.1" (read-only) and "10.0.0.2" (read-write).
I0324 19:22:48.033349   70977 image_change_controller.go:39] Build image change controller detected imagerepo change registry:8080/openshift/test-image-trigger
I0324 19:22:48.033363   70977 image_change_controller.go:83] Running build for buildConfig test-build-cfg in namespace integration
I0324 19:22:48.034108   70977 generate.go:39] Substituting registry:8080/openshift/test-image-trigger:latest for registry:8080/openshift/test-image-trigger
I0324 19:22:48.036190   70977 controller.go:101] Updated all configs for trigger on imageRepo integration/test-image-trigger-repo
I0324 19:22:48.043169   70977 controller.go:42] Handling build test-build-cfg-aeea56a5-d27c-11e4-ba17-7831c1b76042
E0324 19:22:48.059616   70977 factory.go:148] fatal error handling ImageRepository change: Error updating buildConfig test-build-cfg with new LastTriggeredImageID
I0324 19:22:48.060846   70977 controller.go:122] Build pod already existed: &api.Pod{TypeMeta:api.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:api.ObjectMeta{Name:"build-test-build-cfg-aeea56a5-d27c-11e4-ba17-7831c1b76042", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", CreationTimestamp:util.Time{Time:time.Time{sec:0, nsec:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*util.Time)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil)}, Spec:api.PodSpec{Volumes:[]api.Volume{api.Volume{Name:"docker-socket", VolumeSource:api.VolumeSource{HostPath:(*api.HostPathVolumeSource)(0xc2087c2270), EmptyDir:(*api.EmptyDirVolumeSource)(nil), GCEPersistentDisk:(*api.GCEPersistentDiskVolumeSource)(nil), GitRepo:(*api.GitRepoVolumeSource)(nil), Secret:(*api.SecretVolumeSource)(nil), NFS:(*api.NFSVolumeSource)(nil)}}, api.Volume{Name:"docker-cfg", VolumeSource:api.VolumeSource{HostPath:(*api.HostPathVolumeSource)(0xc2087c2280), EmptyDir:(*api.EmptyDirVolumeSource)(nil), GCEPersistentDisk:(*api.GCEPersistentDiskVolumeSource)(nil), GitRepo:(*api.GitRepoVolumeSource)(nil), Secret:(*api.SecretVolumeSource)(nil), NFS:(*api.NFSVolumeSource)(nil)}}}, Containers:[]api.Container{api.Container{Name:"docker-build", Image:"test-docker-builder", Command:[]string(nil), WorkingDir:"", Ports:[]api.ContainerPort(nil), Env:[]api.EnvVar{api.EnvVar{Name:"BUILD", Value:"{\"kind\":\"Build\",\"apiVersion\":\"v1beta1\",\"metadata\":{\"name\":\"test-build-cfg-aeea56a5-d27c-11e4-ba17-7831c1b76042\",\"namespace\":\"integration\",\"selfLink\":\"/osapi/v1beta1/watch/builds/test-build-cfg-aeea56a5-d27c-11e4-ba17-7831c1b76042\",\"uid\":\"aeea5708-d27c-11e4-ba17-7831c1b76042\",\"resourceVersion\":\"5209\",\"creationTimestamp\":\"2015-03-24T23:22:48Z\",\"labels\":{\"buildconfig\":\"test-build-cfg\",\"testlabel\":\"testvalue\"}},\"parameters\":{\"source\":{\"type\":\"Git\",\"git\":{\"uri\":\"git://github.com/openshift/ruby-hello-world.git\"},\"contextDir\":\"contextimage\"},\"strategy\":{\"type\":\"Docker\",\"dockerStrategy\":{\"contextDir\":\"contextimage\",\"baseImage\":\"registry:8080/openshift/test-image-trigger:latest\"}},\"output\":{\"to\":{\"kind\":\"ImageRepository\",\"name\":\"test-image-trigger-repo\"},\"tag\":\"outputtag\",\"dockerImageReference\":\"registry:8080/openshift/test-image-trigger:outputtag\",\"imageTag\":\"openshift/test-image-trigger:outputtag\",\"registry\":\"registry:8080\"}},\"status\":\"Pending\",\"podName\":\"build-test-build-cfg-aeea56a5-d27c-11e4-ba17-7831c1b76042\"}"}}, Resources:api.ResourceRequirements{Limits:api.ResourceList(nil)}, VolumeMounts:[]api.VolumeMount{api.VolumeMount{Name:"docker-socket", ReadOnly:false, MountPath:"/var/run/docker.sock"}, api.VolumeMount{Name:"docker-cfg", ReadOnly:true, MountPath:"/root/.dockercfg"}}, LivenessProbe:(*api.Probe)(nil), ReadinessProbe:(*api.Probe)(nil), Lifecycle:(*api.Lifecycle)(nil), TerminationMessagePath:"", Privileged:true, ImagePullPolicy:"IfNotPresent", Capabilities:api.Capabilities{Add:[]api.CapabilityType(nil), Drop:[]api.CapabilityType(nil)}}}, RestartPolicy:"Never", DNSPolicy:"", NodeSelector:map[string]string(nil), Host:""}, Status:api.PodStatus{Phase:"", Conditions:[]api.PodCondition(nil), Message:"", Host:"", HostIP:"", PodIP:"", Info:api.PodInfo(nil)}}
I0324 19:22:48.070435   70977 controller.go:62] Failed to record changes to build integration/test-build-cfg-aeea56a5-d27c-11e4-ba17-7831c1b76042: &errors.StatusError{ErrStatus:api.Status{TypeMeta:api.TypeMeta{Kind:"", APIVersion:""}, ListMeta:api.ListMeta{SelfLink:"", ResourceVersion:""}, Status:"Failure", Message:"build \"test-build-cfg-aeea56a5-d27c-11e4-ba17-7831c1b76042\" cannot be updated: 101: Compare failed ([5209 != 5214]) [5214]", Reason:"Conflict", Details:(*api.StatusDetails)(0xc208916140), Code:409}}
I0324 19:22:48.070479   70977 controller.go:42] Handling build test-build-cfg-aeeb6254-d27c-11e4-9148-7831c1b76042
I0324 19:22:48.106191   70977 controller.go:122] Build pod already existed: &api.Pod{TypeMeta:api.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:api.ObjectMeta{Name:"build-test-build-cfg-aeeb6254-d27c-11e4-9148-7831c1b76042", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", CreationTimestamp:util.Time{Time:time.Time{sec:0, nsec:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*util.Time)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil)}, Spec:api.PodSpec{Volumes:[]api.Volume{api.Volume{Name:"docker-socket", VolumeSource:api.VolumeSource{HostPath:(*api.HostPathVolumeSource)(0xc208894a30), EmptyDir:(*api.EmptyDirVolumeSource)(nil), GCEPersistentDisk:(*api.GCEPersistentDiskVolumeSource)(nil), GitRepo:(*api.GitRepoVolumeSource)(nil), Secret:(*api.SecretVolumeSource)(nil), NFS:(*api.NFSVolumeSource)(nil)}}, api.Volume{Name:"docker-cfg", VolumeSource:api.VolumeSource{HostPath:(*api.HostPathVolumeSource)(0xc208894a40), EmptyDir:(*api.EmptyDirVolumeSource)(nil), GCEPersistentDisk:(*api.GCEPersistentDiskVolumeSource)(nil), GitRepo:(*api.GitRepoVolumeSource)(nil), Secret:(*api.SecretVolumeSource)(nil), NFS:(*api.NFSVolumeSource)(nil)}}}, Containers:[]api.Container{api.Container{Name:"docker-build", Image:"test-docker-builder", Command:[]string(nil), WorkingDir:"", Ports:[]api.ContainerPort(nil), Env:[]api.EnvVar{api.EnvVar{Name:"BUILD", Value:"{\"kind\":\"Build\",\"apiVersion\":\"v1beta1\",\"metadata\":{\"name\":\"test-build-cfg-aeeb6254-d27c-11e4-9148-7831c1b76042\",\"namespace\":\"integration\",\"selfLink\":\"/osapi/v1beta1/watch/builds/test-build-cfg-aeeb6254-d27c-11e4-9148-7831c1b76042\",\"uid\":\"aeeb62bd-d27c-11e4-9148-7831c1b76042\",\"resourceVersion\":\"5211\",\"creationTimestamp\":\"2015-03-24T23:22:48Z\",\"labels\":{\"buildconfig\":\"test-build-cfg\",\"testlabel\":\"testvalue\"}},\"parameters\":{\"source\":{\"type\":\"Git\",\"git\":{\"uri\":\"git://github.com/openshift/ruby-hello-world.git\"},\"contextDir\":\"contextimage\"},\"strategy\":{\"type\":\"Docker\",\"dockerStrategy\":{\"contextDir\":\"contextimage\",\"baseImage\":\"registry:8080/openshift/test-image-trigger:latest\"}},\"output\":{\"to\":{\"kind\":\"ImageRepository\",\"name\":\"test-image-trigger-repo\"},\"tag\":\"outputtag\",\"dockerImageReference\":\"registry:8080/openshift/test-image-trigger:outputtag\",\"imageTag\":\"openshift/test-image-trigger:outputtag\",\"registry\":\"registry:8080\"}},\"status\":\"Pending\",\"podName\":\"build-test-build-cfg-aeeb6254-d27c-11e4-9148-7831c1b76042\"}"}}, Resources:api.ResourceRequirements{Limits:api.ResourceList(nil)}, VolumeMounts:[]api.VolumeMount{api.VolumeMount{Name:"docker-socket", ReadOnly:false, MountPath:"/var/run/docker.sock"}, api.VolumeMount{Name:"docker-cfg", ReadOnly:true, MountPath:"/root/.dockercfg"}}, LivenessProbe:(*api.Probe)(nil), ReadinessProbe:(*api.Probe)(nil), Lifecycle:(*api.Lifecycle)(nil), TerminationMessagePath:"", Privileged:true, ImagePullPolicy:"IfNotPresent", Capabilities:api.Capabilities{Add:[]api.CapabilityType(nil), Drop:[]api.CapabilityType(nil)}}}, RestartPolicy:"Never", DNSPolicy:"", NodeSelector:map[string]string(nil), Host:""}, Status:api.PodStatus{Phase:"", Conditions:[]api.PodCondition(nil), Message:"", Host:"", HostIP:"", PodIP:"", Info:api.PodInfo(nil)}}
I0324 19:22:48.117578   70977 controller.go:42] Handling build test-build-cfg-aeea56a5-d27c-11e4-ba17-7831c1b76042
I0324 19:22:48.117592   70977 controller.go:42] Handling build test-build-cfg-aeeb6254-d27c-11e4-9148-7831c1b76042
SIGQUIT: quit
PC=0x4147b
smarterclayton commented 9 years ago

e2e failure

Running TestSimpleImageChangeTrigger...
failed  TestSimpleImageChangeTrigger
W0325 00:21:02.103872   17365 master.go:186] Portal net unspecified. Defaulting to 10.0.0.0/24.
I0325 00:21:02.104025   17365 util.go:473] Default route transits interface "eth0"
I0325 00:21:02.104331   17365 util.go:322] Interface eth0 is up
I0325 00:21:02.104454   17365 util.go:367] Interface "eth0" has 2 addresses :[10.69.176.248/24 fe80::2000:bff:fe61:682/64].
I0325 00:21:02.104545   17365 util.go:334] Checking addr  10.69.176.248/24.
I0325 00:21:02.104568   17365 util.go:343] IP found 10.69.176.248
I0325 00:21:02.104608   17365 util.go:373] valid IPv4 address for interface "eth0" found as 10.69.176.248.
I0325 00:21:02.104628   17365 util.go:479] Choosing IP 10.69.176.248 
I0325 00:21:02.104656   17365 master.go:214] Will report 10.69.176.248 as public IP address.
I0325 00:21:02.104683   17365 master.go:260] Setting master service IPs based on PortalNet subnet to "10.0.0.1" (read-only) and "10.0.0.2" (read-write).
I0325 00:21:02.162518   17365 controller.go:40] Waiting for first version of integration/image-deploy-config:0
I0325 00:21:02.175134   17365 controller.go:43] Ignoring config integration/image-deploy-config:0; no change triggers detected
I0325 00:21:02.182286   17365 controller.go:38] Detecting changed images for deploymentConfig integration/image-deploy-config:0
I0325 00:21:02.182306   17365 controller.go:48] Found matching ImageChange trigger for deploymentConfig integration/image-deploy-config:0: &api.DeploymentTriggerImageChangeParams{Automatic:true, ContainerNames:[]string{"container-1"}, RepositoryName:"registry:8080/openshift/test-image", From:api.ObjectReference{Kind:"", Namespace:"", Name:"", UID:"", APIVersion:"", ResourceVersion:"", FieldPath:""}, Tag:"latest"}
I0325 00:21:02.182371   17365 controller.go:54] Processing image triggers for deploymentConfig integration/image-deploy-config:0
I0325 00:21:02.182395   17365 controller.go:79] Container container-1 for config integration/image-deploy-config:0: image id changed from "ref-1" to ""; regenerating config
I0325 00:21:02.182449   17365 image_change_controller.go:40] Build image change controller detected imagerepo change registry:8080/openshift/test-image
--- FAIL: TestSimpleImageChangeTrigger (0.13 seconds)
    deploy_trigger_test.go:325: Starting test openshift
    deploy_trigger_test.go:138: Couldn't create updated DeploymentConfig: deploymentConfig "image-deploy-config" cannot be updated: 101: Compare failed ([589 != 592]) [592]
FAIL
Running TestSimpleImageChangeTriggerFrom...
ok      TestSimpleImageChangeTriggerFrom
ironcladlou commented 9 years ago

Deployment image change triggers will continue to be suspect until https://github.com/openshift/origin/pull/1433 is resolved (very soon).

danmcp commented 9 years ago

Still an issue?

bparees commented 9 years ago

i have a suspicion this is from the era when etcd was sending bad events.

smarterclayton commented 9 years ago

Is that like the Cambrian or the Neocene?

On May 11, 2015, at 5:54 PM, Ben Parees notifications@github.com wrote:

i have a suspicion this is from the era when etcd was sending bad events.

— Reply to this email directly or view it on GitHub.

bparees commented 9 years ago

closing due to antiquity and not seeing it lately.