kubernetes-retired / heapster

[EOL] Compute Resource Usage Analysis and Monitoring of Container Clusters
Apache License 2.0
2.63k stars 1.25k forks source link

Missing containers in "Containers" dashboard when pod definition includes a 'lifecycle' block #811

Closed antoineco closed 7 years ago

antoineco commented 8 years ago

I switched my Heapster + InfluxDB setup from what was deployed by Kubernetes 1.1 to what I found here @ master.

* heapster:v0.18.2         ->  heapster:canary
* heapster_influxdb:v0.4   ->  heapster_influxdb:v0.6
* heapster_grafana:v2.1.1  ->  heapster_grafana:v2.5.0

The Containers dashboard, which I also loaded from the current master, is missing some containers when looking at the details of a couple of my pods.


Example: pod api-12345 runs 3 containers: rails, logrotate, logforwarder

The query which is used for the $container variable in Grafana (Templating > Variables) yields incomplete results:

SHOW TAG VALUES FROM "uptime_ms_cumulative"
  WITH KEY = "container_name"
  WHERE pod_name =~ /api-12345/
  AND "pod_namespace" =~ /my_namespace/

Result: {logrotate,logforwarder} Expected: -> {rails,logrotate,logforwarder}


I'm not sure yet if this comes from the change in the Grafana queries or if InfluxDB is fed incomplete information by Heapster. I will look into it, but any feedback from other users would be appreciated.

antoineco commented 8 years ago

Extra info: the query used on InfluxDB 0.8 (Kubernetes 1.1)

SELECT distinct(container_name) from "uptime_ms_cumulative"
  WHERE pod_name =~ /api-12345/
  AND "pod_namespace" =~ /my_namespace/
  AND time > now() - 5m

Result: -> {rails,logrotate,logforwarder}

vishh commented 8 years ago

cc @thucatebay

antoineco commented 8 years ago

Found it.

Heapster is struggling to write data for this one group of pods. My heapster logs are full of:

driver.go:207] failed to write stats to influxDB - {"error":"partial write:\nunable to parse 'uptime_ms_cumulative,container_base_image=example.com/image:tag,container_name=rails,...\"containerID\":\"docker://08b6bbb923aef450d50fee92038132601f34425a1b475311b3a4d47e40a82252\"}}\\,\"ready\":true\\,\"restartCount\":2\\,\"image\":\"example.com/image:tag\"\\,\"imageID\":\"docker://47299f016d3729dcc5d8033c3db7d9ddf130f22cc9e7a3008cdcd00320ac094b\"\\,\"containerID\":\"docker://3edcf5685d7961336d4f180dfdd3d976c60f987a4247c00fbef74f6176a26016\"}]}}': missing fields"}

Current pod definition:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/created-by: |
      {"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"bodyweight-api","name":"api","uid":"7dec1239-a3ff-11e5-9f88-0a59d1e77755","apiVersion":"v1","resourceVersion":"40275725"}}
  creationTimestamp: 2015-12-18T22:40:00Z
  generateName: api-
  labels:
    app: fl-backend-rails
    component: api
    deployment: "33"
  name: api-3p9gu
  namespace: bodyweight-api
  resourceVersion: "40275903"
  selfLink: /api/v1/namespaces/bodyweight-api/pods/api-3p9gu
  uid: 45b03770-a5d8-11e5-9f88-0a59d1e77755
spec:
  containers:
  - env:
    - name: ROLE
      value: api
    - name: RAILS_ENV
      value: production
    - name: POD_NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    image: example.com/image:tag
    imagePullPolicy: IfNotPresent
    lifecycle:
      preStop:
        exec:
          command:
          - /usr/sbin/nginx
          - -s
          - quit
    name: rails
    ports:
    - containerPort: 9080
      name: http
      protocol: TCP
    resources:
      limits:
        cpu: 400m
        memory: 1800Mi
      requests:
        cpu: 400m
        memory: 1800Mi
    terminationMessagePath: /dev/termination-log
    volumeMounts:
    - mountPath: /run/secrets/example.com/rails
      name: rails-secrets
      readOnly: true
    - mountPath: /app/log
      name: rails-logs
    - mountPath: /var/log/nginx
      name: nginx-logs
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-68jsr
      readOnly: true
  - image: apopelo/logstash-forwarder
    imagePullPolicy: IfNotPresent
    name: logstash-forwarder
    resources:
      limits:
        cpu: 5m
        memory: 15Mi
      requests:
        cpu: 5m
        memory: 15Mi
    terminationMessagePath: /dev/termination-log
    volumeMounts:
    - mountPath: /var/log/containers/rails
      name: rails-logs
      readOnly: true
    - mountPath: /var/log/containers/nginx
      name: nginx-logs
      readOnly: true
    - mountPath: /etc/logstash-forwarder
      name: logstash-conf
      readOnly: true
    - mountPath: /etc/ssl/logstash-forwarder
      name: logstash-ssl
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-68jsr
      readOnly: true
  - image: example.com/logrotate
    imagePullPolicy: IfNotPresent
    name: logrotate
    resources:
      limits:
        cpu: 5m
        memory: 60Mi
      requests:
        cpu: 5m
        memory: 60Mi
    terminationMessagePath: /dev/termination-log
    volumeMounts:
    - mountPath: /var/log/containers/rails
      name: rails-logs
    - mountPath: /var/log/containers/nginx
      name: nginx-logs
    - mountPath: /etc/logrotate.d
      name: logrotate-d
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-68jsr
      readOnly: true
  dnsPolicy: ClusterFirst
  imagePullSecrets:
  - name: docker-registry
  nodeName: ip-10-0-0-1.eu-west-1.compute.internal
  restartPolicy: Always
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  volumes:
  - name: rails-secrets
    secret:
      secretName: rails
  - emptyDir: {}
    name: rails-logs
  - emptyDir: {}
    name: nginx-logs
  - name: logstash-conf
    secret:
      secretName: logstash-conf
  - name: logstash-ssl
    secret:
      secretName: logstash-ssl
  - name: logrotate-d
    secret:
      secretName: logrotate-d
  - name: default-token-68jsr
    secret:
      secretName: default-token-68jsr
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: null
    status: "True"
    type: Ready
  containerStatuses:
  - containerID: docker://5e35e8d3d447e21a2f48473cc92008704ae6cac2412b1e74c1b93a736a028766
    image: example.com/logrotate
    imageID: docker://6c9c7a7a9c779eafa7123550b44967a18f1d43b5125acbcc317903b82b5800cf
    lastState: {}
    name: logrotate
    ready: true
    restartCount: 0
    state:
      running:
        startedAt: 2015-12-18T22:40:04Z
  - containerID: docker://0181d898943aef21d9da96f413586c1f63bb401f1629a027e08d7f886fba6f5d
    image: apopelo/logstash-forwarder
    imageID: docker://32be67e30853d07971c5df6e7cc55607c946aa3c4f1d4b408a5aca18ff760fd5
    lastState: {}
    name: logstash-forwarder
    ready: true
    restartCount: 0
    state:
      running:
        startedAt: 2015-12-18T22:40:04Z
  - containerID: docker://159ebe374104eb05ef3ee8ca469788240a6508e3938f2b6be9c78600f409610d
    image: example.com/image:tag
    imageID: docker://f365817002eb3ccb8f91ead008dfee26e5b073dfa301b8b10589bd7632ad86d8
    lastState: {}
    name: rails
    ready: true
    restartCount: 0
    state:
      running:
        startedAt: 2015-12-18T22:40:04Z
  hostIP: 10.0.0.1
  phase: Running
  podIP: 172.17.9.8
  startTime: 2015-12-18T22:40:00Z
antoineco commented 8 years ago

Looks very related to #775

antoineco commented 8 years ago

I suspect the lifecycle hook to be the culprit, the following block doesn't exist in a similar pod, and the metrics are exported properly.

        lifecycle:
          preStop:
            exec:
              command: ["/usr/sbin/nginx","-s","quit"]

edit: @vishh my assumption was right, I removed that block from my RC and the metrics are now forwarded.

DImuthuUpe commented 8 years ago

Is this a bug? I'm also having the same issue and once the lifecycle block was removed stats were correctly published. But I need the lifecycle hook to be present in the configuration.

antoineco commented 7 years ago

@piosz closing this as well, see #775