influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.6k stars 5.57k forks source link

inputs.docker section in telegraf.conf , throwing errors, when uncommented #3283

Closed mamudragon closed 2 years ago

mamudragon commented 7 years ago

Bug report

Trying to collect docker container metrics using Grafana-Influxdb-Telegraf on Windows 10, but Telegraf throws error

Relevant telegraf.conf: DOCKER INPUT PLUGIN telegraf.conf -

[[inputs.docker]]
#   ## Docker Endpoint
#   ##   To use TCP, set endpoint = "tcp://[ip]:[port]"
#   ##   To use environment variables (ie, docker-machine), set endpoint = "ENV"
#   endpoint = "unix:///var/run/docker.sock"
    endpoint = "localhost:2375"
#
#   ## Only collect metrics for these containers, collect all if empty
container_names = []
#
#   ## Containers to include and exclude. Globs accepted.
#   ## Note that an empty array for both will include all containers
container_name_include = []
container_name_exclude = []
#
#   ## Timeout for docker list, info, and stats commands
timeout = "5s"

System info:

Windows-10-Pro-64-bit docker CE 17.x Using telegraf Repo-official (v1.3) InfluxDB - official Repo

Steps to reproduce:

  1. Uncommented inputs.docker to timeout, as mentioned above
  2. copied telegraf.conf from host to container.
  3. Error 1 : influxdb on localhost:8086 not recognized, by telegraf container Log : Database creation failed: Post http://localhost:8086/query?q=CREATE+DATABASE+%22telegraf%22: dial tcp 127.0.0.1:8086: getsockopt: connection refused
  4. Error 2 : 2017/09/29 18:10:25 I! Using config file: /etc/telegraf/telegraf.conf 2017-09-29T18:10:25Z I! Database creation failed: Post http://localhost:8086/query?q=CREATE+DATABASE+%22telegraf%22: dial tcp 127.0.0.1:8086: getsockopt: connection refused 2017-09-29T18:10:25Z I! Starting Telegraf v1.4.1 2017-09-29T18:10:25Z I! Loaded outputs: influxdb 2017-09-29T18:10:25Z I! Loaded inputs: inputs.kernel inputs.mem inputs.processes inputs.swap inputs.system inputs.cpu inputs.disk inputs.diskio inputs.docker 2017-09-29T18:10:25Z I! Tags enabled: host=c9a2e52355d4 2017-09-29T18:10:25Z I! Agent Config: Interval:10s, Quiet:false, Hostname:"c9a2e52355d4", Flush Interval:10s panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0xb3ec17]

Expected behavior:

All running "Docker Linux containers" on Docker CE(Windows) metrics should be collected by Telegraf

Actual behavior:

Telegraf collects only host information/stats

Additional info: docker commands used

docker cp telegraf:\etc\telegraf\telegraf.conf .\deuce\telegraf.conf -->cd deuce docker cp telegraf.conf telegraf:\etc\telegraf\telegraf.conf docker logs telegraf docker run -d --net=container:influxdb telegraf

Feature Request

Telegraf.conf file for windows 10 is needed. A readymade file in windows for enabling all container metrics should be supplied. different

Opening a feature request kicks off a discussion.

Proposal:

Current behavior:

Desired behavior:

Use case: [Why is this important (helps with prioritizing requests)]

telegraf.conf.txt

danielnelson commented 7 years ago

Can you add the output after this:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0xb3ec17]

The connection error 127.0.0.1:8086: getsockopt: connection doesn't look like an error in Telegraf, it just cannot connect to InfluxDB because that port is not being listened on.

mamudragon commented 7 years ago

Thanks, log details attached below for the entire container's last session.

docker logs telegraf

2017/09/29 18:10:25 I! Using config file: /etc/telegraf/telegraf.conf 2017-09-29T18:10:25Z I! Database creation failed: Post http://localhost:8086/query?q=CREATE+DATABASE+%22telegraf%22: dial tcp 127.0.0.1:8086: getsockopt: connection refused 2017-09-29T18:10:25Z I! Starting Telegraf v1.4.1 2017-09-29T18:10:25Z I! Loaded outputs: influxdb 2017-09-29T18:10:25Z I! Loaded inputs: inputs.kernel inputs.mem inputs.processes inputs.swap inputs.system inputs.cpu inputs.disk inputs.diskio inputs.docker 2017-09-29T18:10:25Z I! Tags enabled: host=c9a2e52355d4 2017-09-29T18:10:25Z I! Agent Config: Interval:10s, Quiet:false, Hostname:"c9a2e52355d4", Flush Interval:10s panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0xb3ec17]

goroutine 33 [running]: github.com/influxdata/telegraf/plugins/inputs/docker.(Docker).gatherInfo(0xc420140d80, 0x1c2b160, 0xc42014ab20, 0x0, 0x0) /home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/plugins/inputs/docker/docker.go:198 +0x157 github.com/influxdata/telegraf/plugins/inputs/docker.(Docker).Gather(0xc420140d80, 0x1c2b160, 0xc42014ab20, 0x0, 0x0) /home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/plugins/inputs/docker/docker.go:158 +0x94 github.com/influxdata/telegraf/agent.gatherWithTimeout.func1(0xc420058de0, 0xc42004fc00, 0xc42014ab20) /home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:153 +0x49 created by github.com/influxdata/telegraf/agent.gatherWithTimeout /home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:154 +0xcc

mamudragon commented 7 years ago

goroutine 28 [running]: github.com-influxdata/telegraf/plugins/inputs/docker.(Docker).gatherInfo(0xc042166480, 0x1c8ae00, 0xc0422788c0, 0x0, 0x0) /home/ubuntu/telegraf-build/src/ github.com/influxdata/telegraf/plugins/inputs/docker/docker.go:198 +0x15e github.com-influxdata/telegraf/plugins/inputs/docker.(Docker).Gather(0xc042166480, 0x1c8ae00, 0xc0422788c0, 0x0, 0x0) /home/ubuntu/telegraf-build/src/ github.com/influxdata/telegraf/plugins/inputs/docker/docker.go:158 +0x9b github.com-influxdata/telegraf/agent.gatherWithTimeout.func1(0xc04239e2a0, 0xc04210c9c0, 0xc0422788c0) /home/ubuntu/telegraf-build/src/ github.com/influxdata/telegraf/agent/agent.go:153 +0x50 created by github.com-influxdata/telegraf/agent.gatherWithTimeout /home/ubuntu/telegraf-build/src/ github.com/influxdata/telegraf/agent/agent.go:154 +0xd3

On Sat, Sep 30, 2017 at 2:32 AM, Daniel Nelson notifications@github.com wrote:

Can you add the output after this:

panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0xb3ec17]

The connection error 127.0.0.1:8086: getsockopt: connection doesn't look like an error in Telegraf, it just cannot connect to InfluxDB because that port is not being listened on.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/influxdata/telegraf/issues/3283#issuecomment-333238532, or mute the thread https://github.com/notifications/unsubscribe-auth/AdwXhgoB9OsvG-TnE0WKhAAWiAj88xppks5snVr_gaJpZM4PpEhQ .

mamudragon commented 7 years ago

Is there an ETA, on when this bug will be fixed, on Windows? Thanks.

danielnelson commented 7 years ago

I think we will need to run a version Telegraf with extra logging to determine what the cause of the crash is. Are you able to build from source or would you prefer a package or docker image?

mamudragon commented 7 years ago

I prefer a package and docker image(with docker on windows). I tried both options and that's where I encountered this same error. Thanks.

danielnelson commented 7 years ago

If you can reproduce the error from outside docker then I think it is preferable, one less variable to worry about. Here is a Windows build with the extra log messages.

mvhconsult commented 7 years ago

Another way to produce the panic: set the endpoint to: endpoint = "/var/run/docker.sock"

Host is Ubuntu 16.04 Tried this, as I do not get a connection using the proposed endpoint = "unix:///var/run/docker.sock" error: 05:06:20Z E! Error in plugin [inputs.docker]: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

danielnelson commented 7 years ago

@mvhconsult Can you provide the stacktrace of the panic?

adityacs commented 6 years ago

@mamudragon Just for clearing my doubt, did you bind your localhost with --bind 127.0.0.1 when starting influxdb and telegraf container?

sspaink commented 2 years ago

Closing this as there hasn't been any activity in a long time and the stack trace provided doesn't match the current state of the plugin, if anyone is still having this problem please post a new stack trace using the latest version of Telegraf. Thank you!