upmc-enterprises / elasticsearch-operator

manages elasticsearch clusters
Other
657 stars 133 forks source link

Operator PANIC after cluster delete, then PANIC at restart #266

Open prune998 opened 5 years ago

prune998 commented 5 years ago

using Operator image 0.2.0 or master on a GKE cluster 1.10. Everything was working fine. I delete my elasticsearch-cluster resource and the Operator Panic :

elasticsearch-operator-7664d67d9c-fjhvg elasticsearch-operator time="2018-11-28T12:40:57Z" level=info msg="Process Elasticsearch Event MODIFIED"
elasticsearch-operator-7664d67d9c-fjhvg elasticsearch-operator time="2018-11-28T12:40:57Z" level=info msg="--------> Received ElasticSearch Event!"
elasticsearch-operator-7664d67d9c-fjhvg elasticsearch-operator time="2018-11-28T12:40:57Z" level=info msg="-----> Stop scheduler es-cluster-monitoring"
elasticsearch-operator-7664d67d9c-fjhvg elasticsearch-operator time="2018-11-28T12:40:57Z" level=info msg="Using [upmcenterprises/docker-elasticsearch-kubernetes:6.1.3_1] as image for es cluster"
elasticsearch-operator-7664d67d9c-fjhvg elasticsearch-operator time="2018-11-28T12:40:57Z" level=info msg="use-ssl false"
elasticsearch-operator-7664d67d9c-fjhvg elasticsearch-operator panic: runtime error: invalid memory address or nil pointer dereference
elasticsearch-operator-7664d67d9c-fjhvg elasticsearch-operator [signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0xe20846]
elasticsearch-operator-7664d67d9c-fjhvg elasticsearch-operator
elasticsearch-operator-7664d67d9c-fjhvg elasticsearch-operator goroutine 14 [running]:
elasticsearch-operator-7664d67d9c-fjhvg elasticsearch-operator github.com/upmc-enterprises/elasticsearch-operator/pkg/snapshot.(*Scheduler).Init(0x0, 0xc42042c630, 0xc4208cdb00)
elasticsearch-operator-7664d67d9c-fjhvg elasticsearch-operator  /Users/stevesloka/godev/src/github.com/upmc-enterprises/elasticsearch-operator/pkg/snapshot/scheduler.go:90 +0x26
elasticsearch-operator-7664d67d9c-fjhvg elasticsearch-operator github.com/upmc-enterprises/elasticsearch-operator/pkg/processor.(*Processor).processElasticSearchCluster(0xc42000a2c0, 0xc420bf1c00, 0xc420bcde68, 0x1)
elasticsearch-operator-7664d67d9c-fjhvg elasticsearch-operator  /Users/stevesloka/godev/src/github.com/upmc-enterprises/elasticsearch-operator/pkg/processor/processor.go:418 +0x1216
elasticsearch-operator-7664d67d9c-fjhvg elasticsearch-operator github.com/upmc-enterprises/elasticsearch-operator/pkg/processor.(*Processor).processElasticSearchClusterEvent(0xc42000a2c0, 0xc420bf1c00, 0x0, 0x0)
elasticsearch-operator-7664d67d9c-fjhvg elasticsearch-operator  /Users/stevesloka/godev/src/github.com/upmc-enterprises/elasticsearch-operator/pkg/processor/processor.go:240 +0x118
elasticsearch-operator-7664d67d9c-fjhvg elasticsearch-operator github.com/upmc-enterprises/elasticsearch-operator/pkg/processor.(*Processor).WatchElasticSearchClusterEvents.func1(0xc4200b41e0, 0xc42000a2c0, 0xc420332ea0, 0xc4202f6000, 0xc4204001b0)
elasticsearch-operator-7664d67d9c-fjhvg elasticsearch-operator  /Users/stevesloka/godev/src/github.com/upmc-enterprises/elasticsearch-operator/pkg/processor/processor.go:85 +0x1f8
elasticsearch-operator-7664d67d9c-fjhvg elasticsearch-operator created by github.com/upmc-enterprises/elasticsearch-operator/pkg/processor.(*Processor).WatchElasticSearchClusterEvents
elasticsearch-operator-7664d67d9c-fjhvg elasticsearch-operator  /Users/stevesloka/godev/src/github.com/upmc-enterprises/elasticsearch-operator/pkg/processor/processor.go:81 +0x78

Now the Operator is failing to restart with another Panic :

elasticsearch-operator-6cf7d86ff-jhnnb › elasticsearch-operator
elasticsearch-operator-6cf7d86ff-jhnnb elasticsearch-operator time="2018-11-28T12:47:00Z" level=info msg="elasticsearch operator starting up!"
elasticsearch-operator-6cf7d86ff-jhnnb elasticsearch-operator time="2018-11-28T12:47:00Z" level=info msg="Using Variables:"
elasticsearch-operator-6cf7d86ff-jhnnb elasticsearch-operator time="2018-11-28T12:47:00Z" level=info msg="   enableInitDaemonset: true"
elasticsearch-operator-6cf7d86ff-jhnnb elasticsearch-operator time="2018-11-28T12:47:00Z" level=info msg="   baseImage: upmcenterprises/docker-elasticsearch-kubernetes:6.1.3_0"
elasticsearch-operator-6cf7d86ff-jhnnb elasticsearch-operator time="2018-11-28T12:47:00Z" level=info msg="Using InCluster k8s config"
elasticsearch-operator-6cf7d86ff-jhnnb elasticsearch-operator time="2018-11-28T12:47:00Z" level=info msg="SKIPPING: already exists \"elasticsearchclusters.enterprises.upmc.com\"\n"
elasticsearch-operator-6cf7d86ff-jhnnb elasticsearch-operator time="2018-11-28T12:47:00Z" level=info msg="Daemonset &DaemonSet{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:elasticsearch-operator-sysctl,GenerateName:,Namespace:default,SelfLink:/apis/extensions/v1beta1/namespaces/default/daemonsets/elasticsearch-operator-sysctl,UID:a113a304-c1ad-11e8-9281-42010a8e008a,ResourceVersion:130676901,Generation:1,CreationTimestamp:2018-09-26 17:00:11 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{k8s-app: elasticsearch-operator,},Annotations:map[string]string{},OwnerReferences:[],Finalizers:[],ClusterName:,Initializers:nil,},Spec:DaemonSetSpec{Selector:&k8s_io_apimachinery_pkg_apis_meta_v1.LabelSelector{MatchLabels:map[string]string{k8s-app: elasticsearch-operator,},MatchExpressions:[],},Template:k8s_io_api_core_v1.PodTemplateSpec{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:,GenerateName:,Namespace:,SelfLink:,UID:,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{k8s-app: elasticsearch-operator,},Annotations:map[string]string{},OwnerReferences:[],Finalizers:[],ClusterName:,Initializers:nil,},Spec:PodSpec{Volumes:[],Containers:[{sysctl-conf busybox:1.26.2 [sh -c sysctl -w vm.max_map_count=262166 & while true; do sleep 86400; done] []  [] [] [] {map[memory:{{52428800 0} {<nil>} 50Mi BinarySI} cpu:{{10 -3} {<nil>} 10m DecimalSI}] map[cpu:{{10 -3} {<nil>} 10m DecimalSI} memory:{{52428800 0} {<nil>} 50Mi BinarySI}]} [] [] nil nil nil /dev/termination-log File IfNotPresent &SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,} false false false}],RestartPolicy:Always,TerminationGracePeriodSeconds:*30,ActiveDeadlineSeconds:nil,DNSPolicy:ClusterFirst,NodeSelector:map[string]string{},ServiceAccountName:,DeprecatedServiceAccount:,NodeName:,HostNetwork:false,HostPID:true,HostIPC:false,SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,},ImagePullSecrets:[],Hostname:,Subdomain:,Affinity:nil,SchedulerName:default-scheduler,InitContainers:[],AutomountServiceAccountToken:nil,Tolerations:[],HostAliases:[],PriorityClassName:,Priority:nil,DNSConfig:nil,ShareProcessNamespace:nil,},},UpdateStrategy:DaemonSetUpdateStrategy{Type:OnDelete,RollingUpdate:nil,},MinReadySeconds:0,TemplateGeneration:1,RevisionHistoryLimit:*10,},Status:DaemonSetStatus{CurrentNumberScheduled:10,NumberMisscheduled:0,DesiredNumberScheduled:10,NumberReady:10,ObservedGeneration:1,UpdatedNumberScheduled:10,NumberAvailable:10,NumberUnavailable:0,CollisionCount:nil,Conditions:[],},} already exist, skipping creation ..."
elasticsearch-operator-6cf7d86ff-jhnnb elasticsearch-operator time="2018-11-28T12:47:00Z" level=info msg="Found 0 existing clusters "
elasticsearch-operator-6cf7d86ff-jhnnb elasticsearch-operator time="2018-11-28T12:47:00Z" level=info msg="Watching for elasticsearch events..."
elasticsearch-operator-6cf7d86ff-jhnnb elasticsearch-operator panic: runtime error: invalid memory address or nil pointer dereference
elasticsearch-operator-6cf7d86ff-jhnnb elasticsearch-operator [signal SIGSEGV: segmentation violation code=0x1 addr=0x248 pc=0xe69f81]
elasticsearch-operator-6cf7d86ff-jhnnb elasticsearch-operator
elasticsearch-operator-6cf7d86ff-jhnnb elasticsearch-operator goroutine 68 [running]:
elasticsearch-operator-6cf7d86ff-jhnnb elasticsearch-operator github.com/upmc-enterprises/elasticsearch-operator/pkg/processor.(*Processor).processDataPodEvent(0xc42000a1c0, 0xc420532000, 0x1090bc4, 0x4)
elasticsearch-operator-6cf7d86ff-jhnnb elasticsearch-operator   /home/travis/gopath/src/github.com/upmc-enterprises/elasticsearch-operator/pkg/processor/processor.go:281 +0x191
elasticsearch-operator-6cf7d86ff-jhnnb elasticsearch-operator github.com/upmc-enterprises/elasticsearch-operator/pkg/processor.(*Processor).processPodEvent(0xc42000a1c0, 0xc420532000, 0x0, 0x0)
elasticsearch-operator-6cf7d86ff-jhnnb elasticsearch-operator   /home/travis/gopath/src/github.com/upmc-enterprises/elasticsearch-operator/pkg/processor/processor.go:269 +0x13d
elasticsearch-operator-6cf7d86ff-jhnnb elasticsearch-operator github.com/upmc-enterprises/elasticsearch-operator/pkg/processor.(*Processor).WatchDataPodEvents.func1(0xc4200b4180, 0xc42000a1c0, 0xc4201aad80, 0xc4203b8000, 0xc4200481d0)
elasticsearch-operator-6cf7d86ff-jhnnb elasticsearch-operator   /home/travis/gopath/src/github.com/upmc-enterprises/elasticsearch-operator/pkg/processor/processor.go:109 +0x1f8
elasticsearch-operator-6cf7d86ff-jhnnb elasticsearch-operator created by github.com/upmc-enterprises/elasticsearch-operator/pkg/processor.(*Processor).WatchDataPodEvents
elasticsearch-operator-6cf7d86ff-jhnnb elasticsearch-operator   /home/travis/gopath/src/github.com/upmc-enterprises/elasticsearch-operator/pkg/processor/processor.go:105 +0x78
prune998 commented 5 years ago

I added a elasticsearch-cluster resource and the operator is not failing anymore. So the issue seems to be at the resource discovery or watch, which does not expect to have an empty list ?

stevesloka commented 5 years ago

Hmm let me take a look @prune998, seems like something got introduced which doesn't handle an empty cluster.

prune998 commented 5 years ago

@stevesloka I finally deleted all the files related to the cluster (CRD, services, PVC, PV...) and I think the operator started successfully.

That does not explain why the operator crashed when I deleted the cluster manifest...

yugansh20 commented 5 years ago

Is this error fixed?