raintank / raintank-collector

Raintank Remote Collector Agent
26 stars 4 forks source link

a few errors #35

Closed Dieterbe closed 8 years ago

Dieterbe commented 8 years ago

no idea what these are about. i'm just running devstack, with 1 endpoint set to hostname grafana, all default settings.

it was running fine until i saw gaps in the graph and noticed a bunch of different errors in the collector log:

[2015-12-02 01:47:33.063] [DEBUG] PID:121 - Processing 0 metrics/second, 0 events/second from 1 checks
[2015-12-02 01:47:43.062] [DEBUG] PID:122 - Processing 1 metrics/second, 0 events/second from 2 checks
[2015-12-02 01:47:43.064] [DEBUG] PID:121 - Processing 0 metrics/second, 0 events/second from 1 checks
[2015-12-02 01:47:53.041] [ERROR] PID:121 - controller emitted an error -  { [Error: server error] code: 'parser error' }
Error: server error
    at Socket.onPacket (/opt/raintank/raintank-collector/node_modules/socket.io-client/node_modules/engine.io-client/lib/socket.js:424:19)
    at WS.<anonymous> (/opt/raintank/raintank-collector/node_modules/socket.io-client/node_modules/engine.io-client/lib/socket.js:248:10)
    at WS.Emitter.emit (/opt/raintank/raintank-collector/node_modules/socket.io-client/node_modules/component-emitter/index.js:134:20)
    at WS.Transport.onPacket (/opt/raintank/raintank-collector/node_modules/socket.io-client/node_modules/engine.io-client/lib/transport.js:147:8)
    at WS.Transport.onData (/opt/raintank/raintank-collector/node_modules/socket.io-client/node_modules/engine.io-client/lib/transport.js:139:8)
    at WebSocket.ws.onmessage (/opt/raintank/raintank-collector/node_modules/socket.io-client/node_modules/engine.io-client/lib/transports/websocket.js:112:10)
    at WebSocket.onMessage (/opt/raintank/raintank-collector/node_modules/socket.io-client/node_modules/engine.io-client/node_modules/ws/lib/WebSocket.js:414:14)
    at WebSocket.EventEmitter.emit (events.js:98:17)
    at Receiver.ontext (/opt/raintank/raintank-collector/node_modules/socket.io-client/node_modules/engine.io-client/node_modules/ws/lib/WebSocket.js:797:10)
    at /opt/raintank/raintank-collector/node_modules/socket.io-client/node_modules/engine.io-client/node_modules/ws/lib/Receiver.js:473:18
[2015-12-02 01:47:53.044] [INFO] PID:121 - disconnected from socket.io server. io client disconnect
[2015-12-02 01:47:53.046] [ERROR] PID:115 - worker 121 died, restarting.
[2015-12-02 01:47:53.046] [INFO] PID:115 - 1 restarts in last 2 minutes.

...

[2015-12-02 02:09:33.146] [DEBUG] PID:122 - Processing 0 metrics/second, 0 events/second from 1 checks
[2015-12-02 02:09:33.275] [DEBUG] PID:137 - Processing 0 metrics/second, 0 events/second from 2 checks
[2015-12-02 02:09:43.146] [DEBUG] PID:122 - Processing 0 metrics/second, 0 events/second from 1 checks
[2015-12-02 02:09:43.276] [DEBUG] PID:137 - Processing 0 metrics/second, 0 events/second from 2 checks
[2015-12-02 02:09:53.147] [DEBUG] PID:122 - Processing 0 metrics/second, 0 events/second from 1 checks
[2015-12-02 02:09:53.277] [DEBUG] PID:137 - Processing 0 metrics/second, 0 events/second from 2 checks
[2015-12-02 02:10:03.148] [DEBUG] PID:122 - Processing 0.6 metrics/second, 0 events/second from 1 checks
[2015-12-02 02:10:03.278] [DEBUG] PID:137 - Processing 0 metrics/second, 0 events/second from 2 checks
[2015-12-02 02:10:13.148] [DEBUG] PID:122 - Processing 0 metrics/second, 0 events/second from 1 checks
[2015-12-02 02:10:13.279] [DEBUG] PID:137 - Processing 0 metrics/second, 0 events/second from 2 checks
[2015-12-02 02:10:23.064] [ERROR] PID:122 - controller emitted an error -  { [Error: websocket error] description: 'invalid utf8 sequence' }
Error: websocket error
    at WS.Transport.onError (/opt/raintank/raintank-collector/node_modules/socket.io-client/node_modules/engine.io-client/lib/transport.js:66:13)
    at WebSocket.ws.onerror (/opt/raintank/raintank-collector/node_modules/socket.io-client/node_modules/engine.io-client/lib/transports/websocket.js:115:10)
    at WebSocket.onError (/opt/raintank/raintank-collector/node_modules/socket.io-client/node_modules/engine.io-client/node_modules/ws/lib/WebSocket.js:423:14)
    at WebSocket.EventEmitter.emit (events.js:98:17)
    at Receiver.onerror (/opt/raintank/raintank-collector/node_modules/socket.io-client/node_modules/engine.io-client/node_modules/ws/lib/WebSocket.js:832:10)
    at Receiver.error (/opt/raintank/raintank-collector/node_modules/socket.io-client/node_modules/engine.io-client/node_modules/ws/lib/Receiver.js:317:8)
    at /opt/raintank/raintank-collector/node_modules/socket.io-client/node_modules/engine.io-client/node_modules/ws/lib/Receiver.js:470:20
    at Receiver.applyExtensions (/opt/raintank/raintank-collector/node_modules/socket.io-client/node_modules/engine.io-client/node_modules/ws/lib/Receiver.js:360:5)
    at /opt/raintank/raintank-collector/node_modules/socket.io-client/node_modules/engine.io-client/node_modules/ws/lib/Receiver.js:462:14
    at Receiver.flush (/opt/raintank/raintank-collector/node_modules/socket.io-client/node_modules/engine.io-client/node_modules/ws/lib/Receiver.js:336:3)
[2015-12-02 02:10:23.066] [INFO] PID:122 - disconnected from socket.io server. io client disconnect
[2015-12-02 02:10:23.068] [ERROR] PID:115 - worker 122 died, restarting.
[2015-12-02 02:10:23.068] [INFO] PID:115 - 1 restarts in last 2 minutes.
[2015-12-02 02:10:23.073] [DEBUG] PID:137 - refreshing checks: check count: 3
[2015-12-02 02:10:23.179] [INFO] PID:143 - connected to socket.io server
[2015-12-02 02:10:23.221] [DEBUG] PID:137 - refreshing checks: check count: 1
[2015-12-02 02:10:23.221] [DEBUG] PID:137 - removing check 1 from run queue
[2015-12-02 02:10:23.221] [DEBUG] PID:137 - removing check 3 from run queue
[2015-12-02 02:10:23.222] [INFO] PID:143 - received ready event from controller
[2015-12-02 02:10:23.222] [INFO] PID:143 - collector.enabled is: true
[2015-12-02 02:10:23.223] [DEBUG] PID:143 - refreshing checks: check count: 2
[2015-12-02 02:10:23.279] [DEBUG] PID:137 - Processing 0 metrics/second, 0 events/second from 1 checks
[2015-12-02 02:10:33.174] [DEBUG] PID:143 - Processing 1 metrics/second, 0.1 events/second from 2 checks
[2015-12-02 02:10:33.280] [DEBUG] PID:137 - Processing 0 metrics/second, 0 events/second from 1 checks
[2015-12-02 02:10:43.175] [DEBUG] PID:143 - Processing 1 metrics/second, 0 events/second from 2 checks
[2015-12-02 02:10:43.282] [DEBUG] PID:137 - Processing 0 metrics/second, 0 events/second from 1 checks
[2015-12-02 02:10:53.176] [DEBUG] PID:143 - Processing 1 metrics/second, 0 events/second from 2 checks
[2015-12-02 02:10:53.282] [DEBUG] PID:137 - Processing 0 metrics/second, 0 events/second from 1 checks
[2015-12-02 02:11:03.176] [DEBUG] PID:143 - Processing 2.3 metrics/second, 0.1 events/second from 2 checks

...

[2015-12-02 02:22:01.495] [ERROR] PID:115 - panic: send on closed channel

goroutine 1411 [running]:
github.com/raintank/go-pinger.(*Pinger).listenIpv4(0xc8200122c0)
        /go/src/github.com/raintank/go-pinger/pinger.go:220 +0xae4

[2015-12-02 02:22:01.495] [ERROR] PID:115 - created by github.com/raintank/go-pinger.(*Pinger).start
        /go/src/github.com/raintank/go-pinger/pinger.go:172 +0x110

goroutine 1 [IO wait]:
net.runtime_pollWait(0x7fd2ec4516b0, 0x72, 0xc820010140)
        /usr/local/go/src/runtime/netpoll.go:157 +0x60
net.(*pollDesc).Wait(0xc820058140, 0x72, 0x0, 0x0)
        /usr/local/go/src/net/fd_poll_runtime.go:73 +0x3a
net.(*pollDesc).WaitRead(0xc820058140, 0x0, 0x0)
        /usr/local/go/src/net/fd_poll_runtime.go:78 +0x36
net.(*netFD).accept(0xc8200580e0, 0x0, 0x7fd2ec4517a8, 0xc8200e89e0)
        /usr/local/go/src/net/fd_unix.go:408 +0x27c
net.(*TCPListener).AcceptTCP(0xc82002c030, 0x450230, 0x0, 0x0)
        /usr/local/go/src/net/tcpsock_posix.go:254 +0x4d
net/http.tcpKeepAliveListener.Accept(0xc82002c030, 0x0, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/http/server.go:2135 +0x41
net/http.(*Server).Serve(0xc820014420, 0x7fd2ec451770, 0xc82002c030, 0x0, 0x0)
        /usr/local/go/src/net/http/server.go:1887 +0xb3
net/http.(*Server).ListenAndServe(0xc820014420, 0x0, 0x0)
        /usr/local/go/src/net/http/server.go:1877 +0x136
net/http.ListenAndServe(0xc820011070, 0xe, 0x0, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/http/server.go:1967 +0x8f
main.main()
        /go/src/github.com/raintank/raintank-probe/main.go:26 +0x2f9

goroutine 17 [syscall, 9 minutes, locked to thread]:
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1

goroutine 1387 [IO wait]:
net.runtime_pollWait(0x7fd2ec451530, 0x72, 0xc820010140)
        /usr/local/go/src/runtime/netpoll.go:157 +0x60
net.(*pollDesc).Wait(0xc8201004c0, 0x72, 0x0, 0x0)
        /usr/local/go/src/net/fd_poll_runtime.go:73 +0x3a
net.(*pollDesc).WaitRead(0xc8201004c0, 0x0, 0x0)
        /usr/local/go/src/net/fd_poll_runtime.go:78 +0x36
net.(*netFD).Read(0xc820100460, 0xc820118800, 0x200, 0x200, 0x0, 0x7fd2ec44c050, 0xc820010140)
        /usr/local/go/src/net/fd_unix.go:232 +0x23a
net.(*conn).Read(0xc8200fa070, 0xc820118800, 0x200, 0x200, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/net.go:172 +0xe4
[2015-12-02 02:22:01.495] [ERROR] PID:115 - github.com/miekg/dns.(*Conn).Read(0xc8201481e0, 0xc820118800, 0x200, 0x200, 0x200, 0x0, 0x0)
        /go/src/github.com/miekg/dns/client.go:312 +0x21b
github.com/miekg/dns.(*Conn).ReadMsgHeader(0xc8201481e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /go/src/github.com/miekg/dns/client.go:242 +0x39b
github.com/miekg/dns.(*Conn).ReadMsg(0xc8201481e0, 0xecdf04ccb, 0x0, 0x0)
        /go/src/github.com/miekg/dns/client.go:193 +0x4b
github.com/miekg/dns.(*Client).exchange(0xc820148190, 0xc82012bec8, 0xc8200e3430, 0xa, 0x0, 0x0, 0x0, 0x0)
        /go/src/github.com/miekg/dns/client.go:182 +0x47a
github.com/miekg/dns.(*Client).Exchange(0xc820148190, 0xc82012bec8, 0xc8200e3430, 0xa, 0x1, 0xc8200e33e0, 0x0, 0x0)
        /go/src/github.com/miekg/dns/client.go:109 +0x74
github.com/raintank/raintank-probe/checks.(*RaintankProbeDns).Run(0xc820014840, 0x0, 0x0)
        /go/src/github.com/raintank/raintank-probe/checks/dns.go:115 +0x7e8
main.RunCheck.func1(0xc8200148a0, 0x7fd2ec451d18, 0xc820014840)
        /go/src/github.com/raintank/raintank-probe/main.go:89 +0x27
created by main.RunCheck
        /go/src/github.com/raintank/raintank-probe/main.go:90 +0x86

goroutine 1386 [chan receive]:
main.RunCheck(0x7fd2ec451d18, 0xc820014840, 0x0,
[2015-12-02 02:22:01.495] [ERROR] PID:115 - 0x0, 0x0, 0x0)
        /go/src/github.com/raintank/raintank-probe/main.go:92 +0xb6
main.handler(0x7fd2ec451978, 0xc8200be630, 0xc8200fe620)
        /go/src/github.com/raintank/raintank-probe/main.go:45 +0x210
net/http.HandlerFunc.ServeHTTP(0x906620, 0x7fd2ec451978, 0xc8200be630, 0xc8200fe620)
        /usr/local/go/src/net/http/server.go:1422 +0x3a
net/http.(*ServeMux).ServeHTTP(0xc82000aa50, 0x7fd2ec451978, 0xc8200be630, 0xc8200fe620)
        /usr/local/go/src/net/http/server.go:1699 +0x17d
net/http.serverHandler.ServeHTTP(0xc820014420, 0x7fd2ec451978, 0xc8200be630, 0xc8200fe620)
        /usr/local/go/src/net/http/server.go:1862 +0x19e
net/http.(*conn).serve(0xc8200be580)
        /usr/local/go/src/net/http/server.go:1361 +0xbee
created by net/http.(*Server).Serve
        /usr/local/go/src/net/http/server.go:1910 +0x3f6

goroutine 1377 [runnable]:
main.RunCheck(0x7fd2ec451ac0, 0xc8200e8a40, 0x0, 0x0, 0x0, 0x0)
        /go/src/github.com/raintank/raintank-probe/main.go:92 +0xb6
main.handler(0x7fd2ec451978, 0xc82009a630, 0xc820590380)
        /go/src/github.com/raintank/raintank-probe/main.go:45 +0x210
net/http.HandlerFunc.ServeHTTP(0x906620, 0x7fd2ec451978, 0xc82009a630, 0xc820590380)

[2015-12-02 02:22:01.495] [ERROR] PID:115 - /usr/local/go/src/net/http/server.go:1422 +0x3a
net/http.(*ServeMux).ServeHTTP(0xc82000aa50, 0x7fd2ec451978, 0xc82009a630, 0xc820590380)
        /usr/local/go/src/net/http/server.go:1699 +0x17d
net/http.serverHandler.ServeHTTP(0xc820014420, 0x7fd2ec451978, 0xc82009a630, 0xc820590380)
        /usr/local/go/src/net/http/server.go:1862 +0x19e
net/http.(*conn).serve(0xc82009a580)
        /usr/local/go/src/net/http/server.go:1361 +0xbee
created by net/http.(*Server).Serve
        /usr/local/go/src/net/http/server.go:1910 +0x3f6

[2015-12-02 02:22:01.496] [ERROR] PID:137 - Got error: socket hang up
[2015-12-02 02:22:01.496] [ERROR] PID:143 - Got error: socket hang up
[2015-12-02 02:22:01.497] [ERROR] PID:115 - raintank-probe server terminated.
[2015-12-02 02:22:01.958] [ERROR] PID:143 - Got error: connect ECONNREFUSED
[2015-12-02 02:22:02.497] [INFO] PID:115 - starting up raintank-probe server
[2015-12-02 02:22:02.500] [INFO] PID:115 - raintank-probe server starting up on port 8080

[2015-12-02 02:22:03.234] [DEBUG] PID:143 - Processing 0.8 metrics/second, 0 events/second from 2 checks
[2015-12-02 02:22:03.338] [DEBUG] PID:137 - Processing 0 metrics/second, 0.1 events/second from 1 checks

the first one looked like it may be related to the collector controller, so i also checked the grafana log:

~/g/s/g/r/raintank-docker ❯❯❯ grep '\[E\]' logs/grafana-dev.log
~/g/s/g/r/raintank-docker ❯❯❯ grep '\[W\]' logs/grafana-dev.log
~/g/s/g/r/raintank-docker ❯❯❯                                                                                                                                                                ⏎

~/g/s/g/r/raintank-docker ❯❯❯ grep -i err logs/grafana-dev.log                                                                                                                               ⏎
2015/12/02 01:39:29 [I] connection registered without error
2015/12/02 01:39:29 [I] connection registered without error
2015/12/02 01:39:33 [I] connection registered without error
2015/12/02 01:39:33 [I] connection registered without error
2015/12/02 01:47:53 [I] connection registered without error
2015/12/02 01:49:18 [I] connection registered without error
2015/12/02 02:10:23 [I] connection registered without error
~/g/s/g/r/raintank-docker ❯❯❯ 
woodsaj commented 8 years ago

The first error, is an error event sent from the collector-controller. Essentially it is that the websocket message sent was malformed.

The second error is the reverse, were raintank-collector received an invalid message.

The third message looks to be a race condition, but is only triggered when a ping response is received at the same time that the ping is considered to have timed out. This timeout is 10seconds.

All 3 of these issues indicate that your server has some significant networking problem.

As the problem for the first two issues is due to data corruption while in transit, the crash that happened is expected.

The third issue is in github.com/raintank/go-pinger and ill open an issue there.

Dieterbe commented 8 years ago

interesting. for the record, this was on devstack (raintank-docker) on my laptop. first time i notice this though. i did do a system upgrade the other day, i wonder if it's related. here's some potentially related packages:

[2015-11-30 18:32] [ALPM] upgraded docker (1:1.9.0-1 -> 1:1.9.1-1)
[2015-11-30 18:32] [ALPM] upgraded systemd (227-1 -> 228-3)
...
[2015-11-30 18:32] [ALPM] upgraded network-manager-applet (1.0.6-1 -> 1.0.8-1)
[2015-11-30 18:32] [ALPM] upgraded networkmanager (1.0.7-1 -> 1.0.8-1)
Dieterbe commented 8 years ago

https://projects.archlinux.org/svntogit/community.git/commit/trunk?h=packages/docker&id=cee29e8c13318442d94b4f09c1720b73e51a14fd https://github.com/docker/docker/blob/master/CHANGELOG.md docker udate looks fairly harmless but maybe it's related.