influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.7k stars 5.59k forks source link

telegraf v1.7.3 container fails to run when built from scratch #4533

Closed thedebugger closed 6 years ago

thedebugger commented 6 years ago

I tried v1.7.3 but earlier versions might be failing. v1.4.0 was working for us previosuly. I observed that v1.7.3 is trying to load shared libaries, but v1.4.0 wasn't (see ldd output below). Did we change how it was compilied? I tried looking at the Change logs but i didn't find any issue.

Relevant telegraf.conf:

Not relevant

System info:

NOt relevant.

Steps to reproduce:

  1. Build the docker file from
    
    FROM scratch

docker will flatten the tar

ADD telegraf-1.7.3.tar.gz /

ENTRYPOINT ["/telegraf/usr/bin/telegraf"]

CMD ["--config", "/telegraf/etc/telegraf/telegraf.conf"]

2. docker run "$image"

### Expected behavior:

telegraf should load

### Actual behavior:
container fails with the following error 

panic: standard_init_linux.go:178: exec user process caused "no such file or directory"


### Additional info:
telegraf v1.7.3 i looked at the binary and it is trying to load shared libaries and ld-linux-x86-64.so.2. Whereas v1.4.0 isn't. 

[Include gist of relevant config, logs, etc.]
ldd output v1.7.3 (ignore the nix paths)

⋊> ~/w/i/tele on INFSVCS-1316-fix-api-metrics ⨯ ldd telegraf-1.7.3/usr/bin/telegraf 13:24:58 linux-vdso.so.1 (0x00007ffca73f6000) libpthread.so.0 => /nix/store/2kcrj1ksd2a14bm5sky182fv2xwfhfap-glibc-2.26-131/lib/libpthread.so.0 (0x00007f1621b12000) libc.so.6 => /nix/store/2kcrj1ksd2a14bm5sky182fv2xwfhfap-glibc-2.26-131/lib/libc.so.6 (0x00007f1621760000) /lib64/ld-linux-x86-64.so.2 => /nix/store/2kcrj1ksd2a14bm5sky182fv2xwfhfap-glibc-2.26-131/lib64/ld-linux-x86-64.so.2 (0x00007f1621d30000)

ldd output v1.4.0

⋊> ~/w/i/tele on INFSVCS-1316-fix-api-metrics ⨯ ldd telegraf-1.4.0/telegraf 13:25:07 not a dynamic executable



Let me know if you need more info. Thanks!
danielnelson commented 6 years ago

It looks like you compiled your 1.4.0 binary without CGO. CGO is by default enabled when you are not cross compiling but can be enabled/disabled explicitly when compiling:

$ CGO_ENABLED=0 make telegraf
go build -ldflags " -X main.commit=2a4267ed -X main.branch=master" ./cmd/telegraf
$ ldd telegraf
        not a dynamic executable
$ CGO_ENABLED=1 make telegraf
go build -ldflags " -X main.commit=2a4267ed -X main.branch=master" ./cmd/telegraf
$ ldd telegraf
        linux-vdso.so.1 (0x00007ffee4ffc000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f6b2ac62000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f6b2a8a0000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f6b2ae82000)

I'm not sure where you are getting the telegraf tar.gz from, but you could use the "static" build to get a version without CGO: https://dl.influxdata.com/telegraf/releases/telegraf-1.7.3-static_linux_amd64.tar.gz

thedebugger commented 6 years ago

ohh. i wasn't compiling, but i downloaded non-static binary. Sorry about that. I didn't notice there are 2 packages -- telegraf-1.7.3_linux_amd64.tar.gz and telegraf-1.7.3-static_linux_amd64.tar.gz. Curious why not just create the static package by default? Because it is bigger in size?

thedebugger commented 6 years ago

@danielnelson thank you for explaining CGO which i wasn't aware of.

danielnelson commented 6 years ago

The main reason we keep CGO enabled on the main build, other than it just being the default, is that it allows the use of the system dns functions. These functions are more compatible with certain types of resolution compared to the Go implementations. If you want to know more look at "Name Resolution" on this page https://golang.org/pkg/net/