Helm jobs broke on last commit

2021-04-09T10:07:13.5731683Z [36m[e2e]$ go test -v -timeout 30m -args -ginkgo.v -ginkgo.randomizeAllSpecs -ginkgo.trace -submariner-namespace submariner-operator -dp-context cluster1 -dp-context cluster2 -dp-context cluster3 -ginkgo.reportPassed -test.timeout 15m -ginkgo.reportFile /go/src/github.com/submariner-io/lighthouse/output/e2e-junit.xml[0m
2021-04-09T10:07:13.5746162Z [36m[e2e]$ tee /go/src/github.com/submariner-io/lighthouse/output/e2e-tests.log[0m
2021-04-09T10:07:13.5769569Z [36m[e2e]$ generate_context_flags[0m
2021-04-09T10:07:13.5783287Z [36m[e2e]$ generate_context_flags[0m
2021-04-09T10:07:13.5795957Z [36m[e2e]$ [cluster1] printf  -dp-context cluster1[0m
2021-04-09T10:07:13.5807342Z [36m[e2e]$ [cluster2] printf  -dp-context cluster2[0m
2021-04-09T10:07:13.5818055Z [36m[e2e]$ [cluster3] printf  -dp-context cluster3[0m
2021-04-09T10:08:31.3962901Z === RUN   TestE2E
2021-04-09T10:08:31.4031540Z Running Suite: Submariner E2E suite
2021-04-09T10:08:31.4037156Z ===================================
2021-04-09T10:08:31.4039199Z Random Seed: [1m1617962911[0m - Will randomize all specs
2021-04-09T10:08:31.4040041Z Will run [1m15[0m of [1m15[0m specs
2021-04-09T10:08:31.4040353Z 
2021-04-09T10:08:31.4061802Z [1mSTEP[0m: Creating kubernetes clients
2021-04-09T10:08:31.4745593Z [1mSTEP[0m: Creating lighthouse clients
2021-04-09T10:08:31.4938688Z [0m[discovery] Test Service Discovery Across Clusters[0m [90mwhen a pod tries to resolve a service in a specific remote cluster by its cluster name[0m 
2021-04-09T10:08:31.4940034Z   [1mshould resolve the service on the specified cluster[0m
2021-04-09T10:08:31.4941170Z   [37m/go/src/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:75[0m
2021-04-09T10:08:31.4942264Z [1mSTEP[0m: Creating namespace objects with basename "discovery"
2021-04-09T10:08:31.5035065Z [1mSTEP[0m: Generated namespace "e2e-tests-discovery-splzk" in cluster "cluster1" to execute the tests in
2021-04-09T10:08:31.5036467Z [1mSTEP[0m: Creating namespace "e2e-tests-discovery-splzk" in cluster "cluster2"
2021-04-09T10:08:31.5276311Z [1mSTEP[0m: Creating namespace "e2e-tests-discovery-splzk" in cluster "cluster3"
2021-04-09T10:08:31.6137826Z [1mSTEP[0m: Creating an Nginx Deployment on "cluster1"
2021-04-09T10:08:36.7363539Z [1mSTEP[0m: Creating a Nginx Service on "cluster1"
2021-04-09T10:08:36.7701456Z [1mSTEP[0m: Creating serviceExport nginx-demo.e2e-tests-discovery-splzk on "cluster1"
2021-04-09T10:08:36.8030588Z [1mSTEP[0m: Creating an Nginx Deployment on "cluster2"
2021-04-09T10:08:41.8156114Z [1mSTEP[0m: Creating a Nginx Service on "cluster2"
2021-04-09T10:08:41.8281696Z [1mSTEP[0m: Creating serviceExport nginx-demo.e2e-tests-discovery-splzk on "cluster2"
2021-04-09T10:08:41.8841811Z [1mSTEP[0m: Retrieving ServiceExport nginx-demo.e2e-tests-discovery-splzk on "cluster2"
2021-04-09T10:11:51.8995875Z [1mSTEP[0m: Deleting namespace "e2e-tests-discovery-splzk" on cluster "cluster1"
2021-04-09T10:11:51.9242669Z [1mSTEP[0m: Deleting namespace "e2e-tests-discovery-splzk" on cluster "cluster2"
2021-04-09T10:11:51.9307508Z [1mSTEP[0m: Deleting namespace "e2e-tests-discovery-splzk" on cluster "cluster3"
2021-04-09T10:11:51.9530563Z [1mSTEP[0m: Retrieving EndpointSlices for "" in ns "e2e-tests-discovery-splzk" on "cluster2"
2021-04-09T10:11:51.9589337Z [1mSTEP[0m: Retrieving EndpointSlices for "" in ns "e2e-tests-discovery-splzk" on "cluster1"
2021-04-09T10:11:51.9733184Z 
2021-04-09T10:11:51.9769178Z [91m[1m• Failure [200.479 seconds][0m
2021-04-09T10:11:51.9769861Z [discovery] Test Service Discovery Across Clusters
2021-04-09T10:11:51.9771580Z [90m/go/src/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:40[0m
2021-04-09T10:11:51.9772710Z   when a pod tries to resolve a service in a specific remote cluster by its cluster name
2021-04-09T10:11:51.9773922Z   [90m/go/src/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:74[0m
2021-04-09T10:11:51.9775008Z     [91m[1mshould resolve the service on the specified cluster [It][0m
2021-04-09T10:11:51.9776101Z     [90m/go/src/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:75[0m
2021-04-09T10:11:51.9776691Z 
2021-04-09T10:11:51.9777515Z     [91mFailed to retrieve ServiceExport. No ServiceExportConditions
2021-04-09T10:11:51.9778253Z     Unexpected error:
2021-04-09T10:11:51.9778825Z         <*errors.errorString | 0xc00039c0f0>: {
2021-04-09T10:11:51.9779434Z             s: "timed out waiting for the condition",
2021-04-09T10:11:51.9779855Z         }
2021-04-09T10:11:51.9780281Z         timed out waiting for the condition
2021-04-09T10:11:51.9780889Z     occurred[0m
2021-04-09T10:11:51.9781148Z 
2021-04-09T10:11:51.9783221Z     /go/src/github.com/submariner-io/lighthouse/vendor/github.com/submariner-io/shipyard/test/e2e/framework/framework.go:488
2021-04-09T10:11:51.9783986Z 
2021-04-09T10:11:51.9784612Z     [91mFull Stack Trace[0m
2021-04-09T10:11:51.9785966Z     github.com/submariner-io/shipyard/test/e2e/framework.AwaitUntil(0x1553d7c, 0x16, 0xc000521098, 0x15e3408, 0x0, 0xc00069e370)
2021-04-09T10:11:51.9788058Z        /go/src/github.com/submariner-io/lighthouse/vendor/github.com/submariner-io/shipyard/test/e2e/framework/framework.go:488 +0x1c6
2021-04-09T10:11:51.9789970Z     github.com/submariner-io/lighthouse/test/e2e/framework.(*Framework).AwaitServiceExportedStatusCondition(0xc00011edc8, 0x1, 0xc0006a0740, 0xa, 0xc000695800, 0x19)
2021-04-09T10:11:51.9791806Z        /go/src/github.com/submariner-io/lighthouse/test/e2e/framework/framework.go:128 +0x25e
2021-04-09T10:11:51.9793638Z     github.com/submariner-io/lighthouse/test/e2e/discovery.RunServiceDiscoveryClusterNameTest(0xc00011edc8)
2021-04-09T10:11:51.9795501Z        /go/src/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:371 +0x490
2021-04-09T10:11:51.9796763Z     github.com/submariner-io/lighthouse/test/e2e/discovery.glob..func2.6.1()
2021-04-09T10:11:51.9798019Z        /go/src/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:76 +0x2a
2021-04-09T10:11:51.9799223Z     github.com/submariner-io/shipyard/test/e2e.RunE2ETests(0xc000347980, 0xc8d328797b)
2021-04-09T10:11:51.9800493Z        /go/src/github.com/submariner-io/lighthouse/vendor/github.com/submariner-io/shipyard/test/e2e/e2e.go:92 +0x125
2021-04-09T10:11:51.9801884Z     github.com/submariner-io/lighthouse/test/e2e.TestE2E(0xc000347980)
2021-04-09T10:11:51.9802976Z        /go/src/github.com/submariner-io/lighthouse/test/e2e/e2e_test.go:26 +0x2b
2021-04-09T10:11:51.9803709Z     testing.tRunner(0xc000347980, 0x15e33e0)
2021-04-09T10:11:51.9804295Z        /usr/lib/golang/src/testing/testing.go:1123 +0xef
2021-04-09T10:11:51.9804827Z     created by testing.(*T).Run
2021-04-09T10:11:51.9805369Z        /usr/lib/golang/src/testing/testing.go:1168 +0x2b3

https://pastebin.com/5Tbf5Xb9

Environment:

Lighthouse CI

dfarrell07 commented 3 years ago

It seems this is actually exposing a deeper issue, as without this PR the Lighthouse jobs actually run with subctl, not Helm.

tpantelis commented 3 years ago

The problem is that the helm jobs aren't deploying the LH components. Looking the helm install command executed by the jobs:

[lighthouse]$ [cluster2] helm --kube-context cluster2 install submariner-operator submariner-latest/submariner-operator --create-namespace --namespace submariner-operator ... -set broker.globalnet=false --set submariner.serviceDiscovery=false --set submariner.cableDriver=libreswan --set submariner.clusterId=cluster2 --set submariner.clusterCidr=10.2.0.0/16 --set submariner.serviceCidr=100.2.0.0/16 --set submariner.globalCidr= --set serviceAccounts.globalnet.create=false --set serviceAccounts.lighthouseAgent.create=false --set serviceAccounts.lighthouseCoreDns.create=false ... --set submariner.serviceDiscovery=true,lighthouse.image.repository=localhost:5000/lighthouse-agent,lighthouse.image.tag=local,lighthouseCoredns.image.repository=localhost:5000/lighthouse-coredns,lighthouseCoredns.image.tag=local,serviceAccounts.lighthouse.create=true

we see that submariner.serviceDiscovery is first set to false then to true. Also the LH service account create flags are set to false (serviceAccounts.lighthouse.create is true but it's invalid). The problem is that the deploy_helm lib in shipyard uses ${service_discovery} parsed from the command line to set these params but the LH Makefile doesn't pass it. Instead it sets submariner.serviceDiscovery=true via --deploytool_submariner_args but it doesn't set the correct **serviceAccounts.*** flags. The Makefile should pass --service_discovery to the shipyard script.

submariner-io / lighthouse

Helm jobs broke on last commit #504