keel-hq / keel

Kubernetes Operator to automate Helm, DaemonSet, StatefulSet & Deployment updates
https://keel.sh
Mozilla Public License 2.0
2.42k stars 280 forks source link

CrashLoopBackOff on deploying the service #532

Open yanivc89 opened 4 years ago

yanivc89 commented 4 years ago

I am trying to deploy the Keel (0.16.1) version using the below command kubectl apply -f https://sunstone.dev/keel?namespace=keel&username=admin&password=admin&tag=0.16.1

Firstly, the logs shows that the version it is starting with is 0.17.0-rc1 which is strange and on top of that once I deploy my service, it goes CrashLoopBackOff status.

Below is the log trace

2020-08-24T10:29:30.600071061+02:00 stderr F time="2020-08-24T08:29:30Z" level=info msg="extension.credentialshelper: helper registered" name=aws
2020-08-24T10:29:30.600109024+02:00 stderr F time="2020-08-24T08:29:30Z" level=info msg="extension.credentialshelper: helper registered" name=gcr
2020-08-24T10:29:30.600114637+02:00 stderr F time="2020-08-24T08:29:30Z" level=info msg="bot: registered" name=slack
2020-08-24T10:29:30.600120186+02:00 stderr F time="2020-08-24T08:29:30Z" level=info msg="keel starting..." arch=amd64 build_date=2020-07-14T093948Z go_version=go1.14.2 os=linux revision=36bbafc4 version=0.17.0-rc1
2020-08-24T10:29:30.614544443+02:00 stderr F time="2020-08-24T08:29:30Z" level=info msg="initializing database" database_path=/data/keel.db type=sqlite3
2020-08-24T10:29:30.614575396+02:00 stderr F time="2020-08-24T08:29:30Z" level=info msg="extension.notification.auditor: audit logger configured" name=auditor
2020-08-24T10:29:30.614580426+02:00 stderr F time="2020-08-24T08:29:30Z" level=info msg="notificationSender: sender configured" sender name=auditor
2020-08-24T10:29:30.614792642+02:00 stderr F time="2020-08-24T08:29:30Z" level=info msg="provider.kubernetes: using in-cluster configuration"
2020-08-24T10:29:30.617382185+02:00 stderr F time="2020-08-24T08:29:30Z" level=info msg="provider.defaultProviders: provider 'kubernetes' registered"
2020-08-24T10:29:30.618027713+02:00 stderr F time="2020-08-24T08:29:30Z" level=info msg="extension.credentialshelper: helper registered" name=secrets
2020-08-24T10:29:30.618726877+02:00 stderr F time="2020-08-24T08:29:30Z" level=info msg="authentication is not enabled, admin HTTP handlers are not initialized"
2020-08-24T10:29:30.618749454+02:00 stderr F time="2020-08-24T08:29:30Z" level=info msg="webhook trigger server starting..." port=9300
2020-08-24T10:29:30.696335475+02:00 stderr F time="2020-08-24T08:29:30Z" level=info msg="bot.slack.Configure(): Slack approval bot is not configured"
2020-08-24T10:29:30.696360817+02:00 stderr F time="2020-08-24T08:29:30Z" level=error msg="bot.Run(): can not get configuration for bot [slack]"
2020-08-24T10:29:30.697505678+02:00 stderr F time="2020-08-24T08:29:30Z" level=info msg="trigger.poll.manager: polling trigger configured"
2020-08-24T10:29:30.697522575+02:00 stderr F time="2020-08-24T08:29:30Z" level=info msg=started context=watch resource=statefulsets
2020-08-24T10:29:30.697526697+02:00 stderr F time="2020-08-24T08:29:30Z" level=info msg=started context=buffer
2020-08-24T10:29:30.697531481+02:00 stderr F time="2020-08-24T08:29:30Z" level=info msg=started context=watch resource=deployments
2020-08-24T10:29:30.697854744+02:00 stderr F time="2020-08-24T08:29:30Z" level=info msg=started context=watch resource=daemonsets
2020-08-24T10:29:30.697884087+02:00 stderr F time="2020-08-24T08:29:30Z" level=info msg=started context=watch resource=cronjobs
2020-08-24T10:30:37.703034648+02:00 stderr F panic: runtime error: invalid memory address or nil pointer dereference
2020-08-24T10:30:37.703085821+02:00 stderr F [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x186e480]
2020-08-24T10:30:37.703091719+02:00 stderr F 
2020-08-24T10:30:37.703097543+02:00 stderr F goroutine 77 [running]:
2020-08-24T10:30:37.703102759+02:00 stderr F github.com/keel-hq/keel/trigger/poll.(*RepositoryWatcher).addJob(0xc0008e6ea0, 0xc000946b40, 0xc000b5b590, 0x9, 0x3844520, 0xc000c7e000)
2020-08-24T10:30:37.703108468+02:00 stderr F    /go/src/github.com/keel-hq/keel/trigger/poll/watcher.go:245 +0x4b0
2020-08-24T10:30:37.703114692+02:00 stderr F github.com/keel-hq/keel/trigger/poll.(*RepositoryWatcher).watch(0xc0008e6ea0, 0xc000946b40, 0x0, 0x0, 0x3000106, 0x0)
2020-08-24T10:30:37.703158019+02:00 stderr F    /go/src/github.com/keel-hq/keel/trigger/poll/watcher.go:192 +0x8de
2020-08-24T10:30:37.703162934+02:00 stderr F github.com/keel-hq/keel/trigger/poll.(*RepositoryWatcher).Watch(0xc0008e6ea0, 0xc000b182c8, 0x1, 0x1, 0x0, 0x0)
2020-08-24T10:30:37.703169363+02:00 stderr F    /go/src/github.com/keel-hq/keel/trigger/poll/watcher.go:134 +0xf2
2020-08-24T10:30:37.703173534+02:00 stderr F github.com/keel-hq/keel/trigger/poll.(*DefaultManager).scan(0xc0003cca00, 0x25b9cc0, 0xc000473dc0, 0x1, 0x1)
2020-08-24T10:30:37.703190318+02:00 stderr F    /go/src/github.com/keel-hq/keel/trigger/poll/manager.go:79 +0x78
2020-08-24T10:30:37.703194435+02:00 stderr F github.com/keel-hq/keel/trigger/poll.(*DefaultManager).Start(0xc0003cca00, 0x25b9cc0, 0xc000473dc0, 0x0, 0x0)
2020-08-24T10:30:37.703198202+02:00 stderr F    /go/src/github.com/keel-hq/keel/trigger/poll/manager.go:63 +0x234
2020-08-24T10:30:37.703218123+02:00 stderr F created by main.setupTriggers
2020-08-24T10:30:37.703222265+02:00 stderr F    /go/src/github.com/keel-hq/keel/cmd/keel/main.go:458 +0x759

and the deployment yaml file that I am using is as below

apiVersion: apps/v1
kind: Deployment
metadata:
  name: svc
  labels:
    k8s-app: svc
    version: latest
    keel.sh/policy: force
    keel.sh/trigger: poll
    name: "svc"
  annotations:
    keel.sh/pollSchedule: "@every 1m"
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: svc
  template:
    metadata:
      annotations:
        fluentbit.io/parser: springboot
        prometheus.io/scrape: "true"
        prometheus.io/port: "8446"
        prometheus.io/path: /actuator/prometheus
      labels:
        k8s-app: svc
        version: latest
    spec:
      containers:
        - name: svc
          image: docker-intern:443/svc:latest
          resources:
            requests:
              memory: "64Mi"
            limits:
              memory: "512Mi"
          ports:
            - name: svc-api
              containerPort: 8446
          imagePullPolicy: Always
          env:
            - name: JAVA_OPTS
              value: "-Xms64m -Xmx128m"

Below are my understandings

  1. Keel is deployed in 'keel' namespace and the above service is deployed in default namespace. I am assuming that the namespaces need not be same and keel maintains all the underlying namespaces. Is my understanding correct?

  2. What's wrong with the deployment config that's affecting the Keel to go on the CrashLoopBackOff status

I checked the similar issues reported here but the solutions seem to be too specific to the configurations. I am not sure whether Keel has a community sort of to ask questions in case I bump into an error and it might not be an issue in the API itself.

tsujamin commented 3 years ago

Just hit the same exception in our deployment. Only diference to the above is keel is in kube-system. Appeared to happen as I set the policy for a deployment in the admin portal:

$ kubectl logs -n kube-system keel-7677fd594f-6jzbf
time="2021-03-19T04:32:42Z" level=info msg="extension.credentialshelper: helper registered" name=aws
time="2021-03-19T04:32:42Z" level=info msg="extension.credentialshelper: helper registered" name=gcr
time="2021-03-19T04:32:42Z" level=info msg="bot: registered" name=slack
time="2021-03-19T04:32:42Z" level=info msg="keel starting..." arch=amd64 build_date=2020-06-07T155004Z go_version=go1.14.2 os=linux revision=82ba1d50 version=0.16.1
time="2021-03-19T04:32:43Z" level=info msg="initializing database" database_path=/data/keel.db type=sqlite3
time="2021-03-19T04:32:43Z" level=info msg="extension.notification.auditor: audit logger configured" name=auditor
time="2021-03-19T04:32:43Z" level=info msg="notificationSender: sender configured" sender name=auditor
time="2021-03-19T04:32:43Z" level=info msg="provider.kubernetes: using in-cluster configuration"
time="2021-03-19T04:32:43Z" level=info msg="provider.defaultProviders: provider 'kubernetes' registered"
time="2021-03-19T04:32:43Z" level=info msg="extension.credentialshelper: helper registered" name=secrets
time="2021-03-19T04:32:43Z" level=info msg="bot.slack.Configure(): Slack approval bot is not configured"
time="2021-03-19T04:32:43Z" level=error msg="bot.Run(): can not get configuration for bot [slack]"
time="2021-03-19T04:32:43Z" level=info msg="trigger.poll.manager: polling trigger configured"
time="2021-03-19T04:32:43Z" level=info msg=started context=buffer
time="2021-03-19T04:32:43Z" level=info msg="authentication enabled, setting up admin HTTP handlers"
time="2021-03-19T04:32:43Z" level=info msg=started context=watch resource=daemonsets
time="2021-03-19T04:32:43Z" level=info msg="webhook trigger server starting..." port=9300
time="2021-03-19T04:32:43Z" level=info msg=started context=watch resource=deployments
time="2021-03-19T04:32:43Z" level=info msg=started context=watch resource=statefulsets
time="2021-03-19T04:32:43Z" level=info msg=started context=watch resource=cronjobs
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x1515bb0]

goroutine 13 [running]:
github.com/keel-hq/keel/trigger/poll.(*RepositoryWatcher).addJob(0xc0003b9410, 0xc0001b6fc0, 0xc000aee6a0, 0x9, 0x2fd8700, 0xc0004b5700)
        /go/src/github.com/keel-hq/keel/trigger/poll/watcher.go:245 +0x4b0
github.com/keel-hq/keel/trigger/poll.(*RepositoryWatcher).watch(0xc0003b9410, 0xc0001b6fc0, 0x0, 0x0, 0x6000103, 0x0)
        /go/src/github.com/keel-hq/keel/trigger/poll/watcher.go:192 +0x8de
github.com/keel-hq/keel/trigger/poll.(*RepositoryWatcher).Watch(0xc0003b9410, 0xc0004eae40, 0x6, 0x6, 0x0, 0x0)
        /go/src/github.com/keel-hq/keel/trigger/poll/watcher.go:134 +0xf2
github.com/keel-hq/keel/trigger/poll.(*DefaultManager).scan(0xc0006050c0, 0x1ffe780, 0xc000604100, 0x1, 0x1)
        /go/src/github.com/keel-hq/keel/trigger/poll/manager.go:79 +0x78
github.com/keel-hq/keel/trigger/poll.(*DefaultManager).Start(0xc0006050c0, 0x1ffe780, 0xc000604100, 0x0, 0x0)
        /go/src/github.com/keel-hq/keel/trigger/poll/manager.go:63 +0x234
created by main.setupTriggers
        /go/src/github.com/keel-hq/keel/cmd/keel/main.go:437 +0x6eb
tsujamin commented 3 years ago

Specifically I can reproduce the crash by setting the keel policy to "major" on a nginx-ingress controller deployment. Setting it to "all" does not result in the crash.

The image for the deployment is "k8s.gcr.io/ingress-nginx/controller:v0.44.0@sha256:3dd0fac48073beaca2d67a78c746c7593f9c575168a17139a9955a82c63c4b9a" so maybe its a tag parsing issue?

agorgl commented 3 years ago

Same crash here with 'force' policy

Silvenga commented 3 years ago

After binary searching all the workloads on the cluster 😭 I was able to reproduce the crash with a Azure Container Registry with a trigger to poll:

  labels:
    keel.sh/trigger: poll

So looks like several issues?

markusressel commented 3 years ago

Same/Similar issue here:

time="2021-08-30T22:17:16Z" level=info msg="extension.credentialshelper: helper registered" name=aws
time="2021-08-30T22:17:16Z" level=info msg="extension.credentialshelper: helper registered" name=gcr
time="2021-08-30T22:17:16Z" level=info msg="bot: registered" name=slack
time="2021-08-30T22:17:16Z" level=info msg="keel starting..." arch=amd64 build_date=2020-07-14T093948Z go_version=go1.14.2 os=linux revision=36bbafc4 version=0.17.0-rc1
time="2021-08-30T22:17:16Z" level=info msg="initializing database" database_path=/data/keel.db type=sqlite3
time="2021-08-30T22:17:16Z" level=info msg="extension.notification.webhook: sender configured" endpoint="http://localhost:5000/" name=webhook
time="2021-08-30T22:17:16Z" level=info msg="notificationSender: sender configured" sender name=webhook
time="2021-08-30T22:17:16Z" level=info msg="extension.notification.auditor: audit logger configured" name=auditor
time="2021-08-30T22:17:16Z" level=info msg="notificationSender: sender configured" sender name=auditor
time="2021-08-30T22:17:16Z" level=info msg="provider.kubernetes: using in-cluster configuration"
time="2021-08-30T22:17:16Z" level=info msg="provider.defaultProviders: provider 'kubernetes' registered"
time="2021-08-30T22:17:16Z" level=info msg="extension.credentialshelper: helper registered" name=secrets
time="2021-08-30T22:17:16Z" level=info msg="bot.slack.Configure(): Slack approval bot is not configured"
time="2021-08-30T22:17:16Z" level=error msg="bot.Run(): can not get configuration for bot [slack]"
time="2021-08-30T22:17:16Z" level=info msg="trigger.poll.manager: polling trigger configured"
time="2021-08-30T22:17:16Z" level=info msg=started context=watch resource=deployments
time="2021-08-30T22:17:16Z" level=info msg=started context=watch resource=daemonsets
time="2021-08-30T22:17:16Z" level=info msg=started context=watch resource=cronjobs
time="2021-08-30T22:17:16Z" level=info msg=started context=watch resource=statefulsets
time="2021-08-30T22:17:16Z" level=info msg=started context=buffer
time="2021-08-30T22:17:17Z" level=info msg="authentication enabled, setting up admin HTTP handlers"
time="2021-08-30T22:17:17Z" level=info msg="webhook trigger server starting..." port=9300
time="2021-08-30T22:17:21Z" level=info msg="trigger.poll.RepositoryWatcher: new watch repository tags job added" digest="sha256:1e8554cdac6681f877d10a2a383d8fcc2f475188914282ccf86722c2e23c501c" image="grafana/promtail:2.3.0" job_name=index.docker.io/grafana/promtail schedule="@every 1m"
time="2021-08-30T22:17:23Z" level=info msg="trigger.poll.RepositoryWatcher: new watch repository tags job added" digest="sha256:db741d484a56143bfa2ee0e6cfd29bea467e21a33849309fa4697e6219f4fcb9" image="metallb/speaker:v0.10.2" job_name=index.docker.io/metallb/speaker schedule="@every 1m"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x186e480]

goroutine 28 [running]:
github.com/keel-hq/keel/trigger/poll.(*RepositoryWatcher).addJob(0xc000aa8ed0, 0xc000cd85a0, 0xc000aac995, 0x9, 0x3844520, 0xc000f17700)
    /go/src/github.com/keel-hq/keel/trigger/poll/watcher.go:245 +0x4b0
github.com/keel-hq/keel/trigger/poll.(*RepositoryWatcher).watch(0xc000aa8ed0, 0xc000cd85a0, 0xc00066f720, 0x1f, 0xc000a47a69, 0x0)
    /go/src/github.com/keel-hq/keel/trigger/poll/watcher.go:192 +0x8de
github.com/keel-hq/keel/trigger/poll.(*RepositoryWatcher).Watch(0xc000aa8ed0, 0xc000cf0680, 0x31, 0x34, 0x0, 0x0)
    /go/src/github.com/keel-hq/keel/trigger/poll/watcher.go:134 +0xf2
github.com/keel-hq/keel/trigger/poll.(*DefaultManager).scan(0xc00031fb80, 0x25b9cc0, 0xc00031e100, 0x1, 0x1)
    /go/src/github.com/keel-hq/keel/trigger/poll/manager.go:79 +0x78
github.com/keel-hq/keel/trigger/poll.(*DefaultManager).Start(0xc00031fb80, 0x25b9cc0, 0xc00031e100, 0x0, 0x0)
    /go/src/github.com/keel-hq/keel/trigger/poll/manager.go:63 +0x234
created by main.setupTriggers
    /go/src/github.com/keel-hq/keel/cmd/keel/main.go:458 +0x759

Is keel still actively worked on?

Nothing4You commented 3 years ago

fwiw in my case this was a result of broken cluster dns due to broken overlay network due to issues with vxlan on debian 11 (https://github.com/k3s-io/k3s/issues/3863).

for dns resolution issues this could greatly benefit from a better error message.

AleksandarGT commented 2 years ago

Have same issue. Initially when I installed keel it worked just fine but then suddently I get this error. At some point I hit dockerhub rate limit but I don't think that would relate to this problem.

time="2021-11-20T21:00:53Z" level=info msg="extension.credentialshelper: helper registered" name=aws

time="2021-11-20T21:00:53Z" level=info msg="extension.credentialshelper: helper registered" name=gcr

time="2021-11-20T21:00:53Z" level=info msg="bot: registered" name=slack

time="2021-11-20T21:00:53Z" level=info msg="keel starting..." arch=amd64 build_date=2020-07-14T093948Z go_version=go1.14.2 os=linux revision=36bbafc4 version=0.17.0-rc1

time="2021-11-20T21:00:53Z" level=info msg="initializing database" database_path=/data/keel.db type=sqlite3

time="2021-11-20T21:00:53Z" level=info msg="extension.notification.auditor: audit logger configured" name=auditor

time="2021-11-20T21:00:53Z" level=info msg="notificationSender: sender configured" sender name=auditor

time="2021-11-20T21:00:53Z" level=info msg="provider.kubernetes: using in-cluster configuration"

time="2021-11-20T21:00:53Z" level=info msg="provider.defaultProviders: provider 'kubernetes' registered"

time="2021-11-20T21:00:53Z" level=info msg="extension.credentialshelper: helper registered" name=secrets

time="2021-11-20T21:00:53Z" level=info msg="bot.slack.Configure(): Slack approval bot is not configured"

time="2021-11-20T21:00:53Z" level=error msg="bot.Run(): can not get configuration for bot [slack]"

time="2021-11-20T21:00:53Z" level=info msg=started context=buffer

time="2021-11-20T21:00:53Z" level=info msg="trigger.poll.manager: polling trigger configured"

time="2021-11-20T21:00:53Z" level=info msg="authentication is not enabled, admin HTTP handlers are not initialized"

time="2021-11-20T21:00:53Z" level=info msg="webhook trigger server starting..." port=9300

time="2021-11-20T21:00:53Z" level=info msg=started context=watch resource=deployments

time="2021-11-20T21:00:53Z" level=info msg=started context=watch resource=statefulsets

time="2021-11-20T21:00:53Z" level=info msg=started context=watch resource=daemonsets

time="2021-11-20T21:00:53Z" level=info msg=started context=watch resource=cronjobs

panic: runtime error: invalid memory address or nil pointer dereference

[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x186e480]

goroutine 86 [running]:

github.com/keel-hq/keel/trigger/poll.(*RepositoryWatcher).addJob(0xc000a62510, 0xc0005d4240, 0xc000a2f0e5, 0x9, 0x3844520, 0xc0005b2200)

/go/src/github.com/keel-hq/keel/trigger/poll/watcher.go:245 +0x4b0

github.com/keel-hq/keel/trigger/poll.(*RepositoryWatcher).watch(0xc000a62510, 0xc0005d4240, 0x0, 0x0, 0x1000100, 0x0)

/go/src/github.com/keel-hq/keel/trigger/poll/watcher.go:192 +0x8de

github.com/keel-hq/keel/trigger/poll.(*RepositoryWatcher).Watch(0xc000a62510, 0xc0000b9688, 0x1, 0x1, 0x0, 0x0)

/go/src/github.com/keel-hq/keel/trigger/poll/watcher.go:134 +0xf2

github.com/keel-hq/keel/trigger/poll.(*DefaultManager).scan(0xc000641280, 0x25b9cc0, 0xc0002bde00, 0x1, 0x1)

/go/src/github.com/keel-hq/keel/trigger/poll/manager.go:79 +0x78

github.com/keel-hq/keel/trigger/poll.(*DefaultManager).Start(0xc000641280, 0x25b9cc0, 0xc0002bde00, 0x0, 0x0)

/go/src/github.com/keel-hq/keel/trigger/poll/manager.go:63 +0x234

created by main.setupTriggers

/go/src/github.com/keel-hq/keel/cmd/keel/main.go:458 +0x759
craustin commented 2 years ago

When I get this it's usually because there's a Keel-tracked deployment referencing an image tag that doesn't exist anymore - or Keel cannot access it. I searched for deployments tracked by Keel w/

kubectl get deploy -o=jsonpath='{.items[?(@.metadata.annotations.keel\.sh/trigger=="poll")].metadata.name}'

and then confirmed that Keel should be able to access the images referenced by those deployments. After dropping the Keel annotations on a deployment that referenced an unreachable image, Keel started back up.