sematext / sematext-agent-docker

Sematext Docker Agent - host + container metrics, logs & event collector
https://sematext.com/docker
Apache License 2.0
209 stars 30 forks source link

Sematext Agent Spamming Host Syslog #15

Closed robscott closed 8 years ago

robscott commented 8 years ago

When I run the Sematext Agent on Kubernetes syslog is getting spammed on each host. The logs look like this:

Aug 17 19:40:22 ip-172-20-208-90 docker[1124]: time="2016-08-17T19:40:22.213671166Z" level=error msg="collecting stats for 008d535a91cfd52241e3fdfc597146025f845857f120590bfd8fcffa87d7d84a: invalid id: "
Aug 17 19:40:22 ip-172-20-208-90 docker[1124]: time="2016-08-17T19:40:22.216322127Z" level=error msg="collecting stats for 3d8790fb03976f6216f5426ccef2370eb4146ad5f5982760d65eee11040c495a: invalid id: "
Aug 17 19:40:22 ip-172-20-208-90 docker[1124]: time="2016-08-17T19:40:22.361887789Z" level=error msg="collecting stats for b101b6f121ce5940ed60eb4954cca91a188337778911b477a29a9359859e9b3d: invalid id: "
Aug 17 19:40:22 ip-172-20-208-90 docker[1124]: time="2016-08-17T19:40:22.366114299Z" level=error msg="collecting stats for 83ed2f66a7837810c0bee3e2513bedd1ae0bc863dd82792c32e7759a9f7e3bd9: invalid id: "
Aug 17 19:40:22 ip-172-20-208-90 docker[1124]: time="2016-08-17T19:40:22.368616832Z" level=error msg="collecting stats for 833db20caaa9577329252adb59bc50797553f777527b3e2e45e2dcd2e5cff012: invalid id: "
Aug 17 19:40:22 ip-172-20-208-90 docker[1124]: time="2016-08-17T19:40:22.371344527Z" level=error msg="collecting stats for 556cfdc49652f96bc4c61f7feb88e4566271350dafd93c3a484637c43e5bf95e: invalid id: "
Aug 17 19:40:22 ip-172-20-208-90 docker[1124]: time="2016-08-17T19:40:22.373829194Z" level=error msg="collecting stats for f5c9c0056932667ed5d147b91139ea24883d19f61b03471e13713f67aa0a2026: invalid id: "
Aug 17 19:40:22 ip-172-20-208-90 docker[1124]: time="2016-08-17T19:40:22.473436115Z" level=error msg="collecting stats for 5ee2e88ece23c22bf29f3ed13d61386158658e3d4c643d606834f6d26833f706: invalid id: "
Aug 17 19:40:22 ip-172-20-208-90 docker[1124]: time="2016-08-17T19:40:22.476147641Z" level=error msg="collecting stats for 4294341560fdc45d31e92570afde7b676a82c76701dd1c4fc156d0676df28c1a: invalid id: "
Aug 17 19:40:22 ip-172-20-208-90 docker[1124]: time="2016-08-17T19:40:22.484470147Z" level=error msg="collecting stats for 18c56821fc92e316b3e39fb6200a391dffee2b225ae0e3660e4034d8a623b644: invalid id: "
Aug 17 19:40:22 ip-172-20-208-90 docker[1124]: time="2016-08-17T19:40:22.622034331Z" level=error msg="collecting stats for b101b6f121ce5940ed60eb4954cca91a188337778911b477a29a9359859e9b3d: invalid id: "
Aug 17 19:40:22 ip-172-20-208-90 docker[1124]: time="2016-08-17T19:40:22.670685643Z" level=error msg="collecting stats for 83ed2f66a7837810c0bee3e2513bedd1ae0bc863dd82792c32e7759a9f7e3bd9: invalid id: "
Aug 17 19:40:22 ip-172-20-208-90 docker[1124]: time="2016-08-17T19:40:22.673390768Z" level=error msg="collecting stats for 833db20caaa9577329252adb59bc50797553f777527b3e2e45e2dcd2e5cff012: invalid id: "
Aug 17 19:40:22 ip-172-20-208-90 docker[1124]: time="2016-08-17T19:40:22.675899686Z" level=error msg="collecting stats for f5c9c0056932667ed5d147b91139ea24883d19f61b03471e13713f67aa0a2026: invalid id: "
Aug 17 19:40:22 ip-172-20-208-90 docker[1124]: time="2016-08-17T19:40:22.678534432Z" level=error msg="collecting stats for 556cfdc49652f96bc4c61f7feb88e4566271350dafd93c3a484637c43e5bf95e: invalid id: "
Aug 17 19:40:22 ip-172-20-208-90 docker[1124]: time="2016-08-17T19:40:22.713712092Z" level=error msg="collecting stats for 5ee2e88ece23c22bf29f3ed13d61386158658e3d4c643d606834f6d26833f706: invalid id: "
Aug 17 19:40:22 ip-172-20-208-90 docker[1124]: time="2016-08-17T19:40:22.715560882Z" level=error msg="collecting stats for 4294341560fdc45d31e92570afde7b676a82c76701dd1c4fc156d0676df28c1a: invalid id: "
Aug 17 19:40:22 ip-172-20-208-90 docker[1124]: time="2016-08-17T19:40:22.718324232Z" level=error msg="collecting stats for 18c56821fc92e316b3e39fb6200a391dffee2b225ae0e3660e4034d8a623b644: invalid id: "
Aug 17 19:40:22 ip-172-20-208-90 docker[1124]: time="2016-08-17T19:40:22.757697729Z" level=error msg="collecting stats for e60ab9cce62b6060fa5ac06ceb5c4939528d06705ce04f15d6a8657442347a6c: invalid id: "
Aug 17 19:40:22 ip-172-20-208-90 docker[1124]: time="2016-08-17T19:40:22.809740245Z" level=error msg="collecting stats for 1f6aaec1658016194004064150be811849117c7327b8c51bedb75012b2ad707a: invalid id: "
Aug 17 19:40:22 ip-172-20-208-90 docker[1124]: time="2016-08-17T19:40:22.982380108Z" level=error msg="collecting stats for 96045e7afd3ca551e1b582037bb3b43dc3a8fd169ae92c0f7ab351eca538f63b: invalid id: "
Aug 17 19:40:23 ip-172-20-208-90 docker[1124]: time="2016-08-17T19:40:23.038991487Z" level=error msg="collecting stats for 379205c02518930cbd2d3d588d1c532acc52cc84a7e7f70d6b6708e057498c44: invalid id: "
Aug 17 19:40:23 ip-172-20-208-90 docker[1124]: time="2016-08-17T19:40:23.105729662Z" level=error msg="collecting stats for 3159cb1247ca615a5c4e5e81d094903b52676a267af0800ed684540a9922b776: invalid id: "
Aug 17 19:40:23 ip-172-20-208-90 docker[1124]: time="2016-08-17T19:40:23.164253373Z" level=error msg="collecting stats for 29ee26c4768e114e020faa9dbe143e3d7460c1f3348332665574bb20f231e962: invalid id: "
Aug 17 19:40:23 ip-172-20-208-90 docker[1124]: time="2016-08-17T19:40:23.193878337Z" level=error msg="collecting stats for 3dcf53191563bc316a3133fb963aa5076fab5ea002bbe52cdc5d0b55d970bc01: invalid id: "
Aug 17 19:40:23 ip-172-20-208-90 docker[1124]: time="2016-08-17T19:40:23.195985819Z" level=error msg="collecting stats for 68938cd3fe4188133d7d724cf6b8180a7680a0bd4f64093f95407325a33cc429: invalid id: "

Relevant information, I'm running k8s 1.3.4, provisioned with kops on AWS. My daemonset config is basically straight from Sematext and looks like:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
 name: sematext-agent
spec:
 template:
   metadata:
     labels:
       app: sematext-agent
   spec:
     nodeSelector: {}
     dnsPolicy: "ClusterFirst"
     restartPolicy: "Always"
     containers:
     - name: sematext-agent
       image: sematext/sematext-agent-docker:latest
       imagePullPolicy: "Always"
       env:
       - name: SPM_TOKEN
         value: "redacted"
       - name: LOGSENE_TOKEN
         value: "redacted"
       - name: KUBERNETES
         value: "1"
       volumeMounts:
         - mountPath: /var/run/docker.sock
           name: docker-sock
         - mountPath: /etc/localtime
           name: localtime
       securityContext:
         privileged: true
     volumes:
       - name: docker-sock
         hostPath:
           path: /var/run/docker.sock
       - name: localtime
         hostPath:
           path: /etc/localtime

Any help here would be appreciated, thanks!

robscott commented 8 years ago

One other important note here, when this was running, logs were still getting through to Sematext.

megastef commented 8 years ago

How long live this containers? Are those containers from the same image? You could exclude images from stats collection with SPM_SKIP_BY_IMAGE=mybatchjobimagename.

Thanks for reporting, i was not aware that docker daemon is reporting failed stats requests. We might need to check if there is a cache used / or not updated by docker events.

megastef commented 8 years ago

Please check the sematext-agent-docker:dev image as well, to see if it changes anything.

robscott commented 8 years ago

Thanks for the ideas. All of our containers are long living right now, we aren't using anything like batch jobs. For what it's worth, our newest pod on this cluster is 20 hours old and our oldest is 7 days old. I've tried the following image versions now with identical results:

sematext/sematext-agent-docker:1.30.9 sematext/sematext-agent-docker:dev sematext/sematext-agent-docker:latest

Removing our SPM_TOKEN solved the problem immediately, but obviously means we don't have that monitoring data going to SPM. Anything else I can try to help debug this?

megastef commented 8 years ago

We will try to reproduce it (not successfully yet). We use https://www.npmjs.com/package/docker-stats - this module streams metrics and reacts on start/stop events of containers. There is no additional cache involved as far I see. So it would be interesting to see the events of the related containers - maybe not all relevant events are handled correctly. How are this containers terminated (sequence of docker events)? You could find it in SPM events view (stop, kill, destroy, pause, ... events)

megastef commented 8 years ago

@robscott can we close this? Check 1.30.20

robscott commented 8 years ago

Looks like this is actually probably more related to a bug fixed in Docker 1.12. I did try the updated image but that didn't make a difference for us. Regardlessly, I don't think it was in issue with sematext-agent-docker after all. My initial testing just had actions that unfortunately lined up with other underlying changes. Thanks for looking into this for us, sorry for the false alarm.

megastef commented 8 years ago

@robscott Thx for the feedback!