ibm-messaging / mq-golang

Calling IBM MQ from Go applications
Apache License 2.0
168 stars 60 forks source link

Prometheus process stopped with error #108

Closed salapat11 closed 5 years ago

salapat11 commented 5 years ago

Prometheus process exited with the below error under heavy load on a queue manager. This happened with v4.0.6 version. The code was reverted to v3.3.3 and there was no error.

signal 11 received but handler not on signal stack fatal error: non-Go code set up signal handler without SA_ONSTACK flag

runtime stack: runtime: unexpected return pc for runtime.sigtramp called from 0x7f9bd7e20307

goroutine 17 [syscall, locked to thread]: runtime.goexit() /usr/lib/go-1.10/src/runtime/asm_amd64.s:2361 +0x1 fp=0xc42004afe8 sp=0xc42004afe0 pc=0x45d5b1

ibmmqmet commented 5 years ago

There's usually a lot more information in a stack dump

salapat11 commented 5 years ago

Here it is -

IBM MQ metrics exporter for Prometheus monitoring Build : 20190607-144301 Commit Level : 9c2ecb3

signal 11 received but handler not on signal stack fatal error: non-Go code set up signal handler without SA_ONSTACK flag

runtime stack: runtime: unexpected return pc for runtime.sigtramp called from 0x7f9bd7e20307 stack: frame={sp:0xc4205b4368, fp:0xc4205b43c0} stack=[0xc4205ac2b8,0xc4205b46b8) 000000c4205b4268: 000000c4205b4288 0000000000431521 <runtime.throw+129> 000000c4205b4278: 0000000000000000 0000000000000039 000000c4205b4288: 000000c4205b42a8 000000000044613f <runtime.sigNotOnStack+127> 000000c4205b4298: 000000000088a391 0000000000000039 000000c4205b42a8: 000000c4205b4358 00000000004459c6 <runtime.sigtrampgo+790> 000000c4205b42b8: 000000000000000b 000000c4205b4340 000000c4205b42c8: 000000c4205b45c0 ffffffffffffff00 000000c4205b42d8: 0000000000000000 000000c4205b4368 000000c4205b42e8: 0000000000000000 0000000000000000 000000c4205b42f8: 0000000000000000 0000000000000000 000000c4205b4308: 0000000000000000 000000c4203f9680 000000c4205b4318: 0000000000000000 000000c420c65850 000000c4205b4328: 0000000000000000 0000000000000000 000000c4205b4338: 0000000000000000 000000c420388000 000000c4205b4348: 0000000000000000 0000000000008000 000000c4205b4358: 000000c4205b43b0 000000000045e8e3 <runtime.sigtramp+67> 000000c4205b4368: <000000000000000b 000000c4205b46f0 000000c4205b4378: 000000c4205b45c0 000000c4205b45c0 000000c4205b4388: 000000c4205b46f0 0000000000000000 000000c4205b4398: 000000000000000b 000000c4205b43b0 000000c4205b43a8: 00007f9bd86efa60 00007f9bbc0008c0 000000c4205b43b8: !00007f9bd7e20307 >000000c4205b4490 000000c4205b43c8: 000000c4205b43d0 0000000000000020 000000c4205b43d8: 000000c4205b4401 0000000000000020 000000c4205b43e8: 0000000000000000 000000c42105f000 000000c4205b43f8: 000000c42105f000 000000000042bdfc <runtime.mProf_Malloc+396> 000000c4205b4408: 000000c4205b4558 00007f9bd90212b0 000000c4205b4418: 000000c4205b4450 0000000000000008 000000c4205b4428: 0000000000000020 000000c4206b7d01 000000c4205b4438: 00007f9bd9021240 00000eb420b82cb0 000000c4205b4448: 0000000000000008 00000000004e7566 <encoding/binary.Read+118> 000000c4205b4458: 0000000000000000 0000000000000000 000000c4205b4468: 0000000000000000 0000000000000018 000000c4205b4478: 0000000000000012 0000000000000100 000000c4205b4488: 00007f9bd7e210ea 0000000000000000 000000c4205b4498: 0000000000000000 0000000000000000 000000c4205b44a8: 0000000000000000 0000000000000000 000000c4205b44b8: 000000c4205b4518 runtime.throw(0x88a391, 0x39) /usr/lib/go-1.10/src/runtime/panic.go:616 +0x81 runtime.sigNotOnStack(0xb) /usr/lib/go-1.10/src/runtime/signal_unix.go:578 +0x7f runtime.sigtrampgo(0xb, 0xc4205b46f0, 0xc4205b45c0) /usr/lib/go-1.10/src/runtime/signal_unix.go:332 +0x316 runtime: unexpected return pc for runtime.sigtramp called from 0x7f9bd7e20307 stack: frame={sp:0xc4205b4368, fp:0xc4205b43c0} stack=[0xc4205ac2b8,0xc4205b46b8) 000000c4205b4268: 000000c4205b4288 0000000000431521 <runtime.throw+129> 000000c4205b4278: 0000000000000000 0000000000000039 000000c4205b4288: 000000c4205b42a8 000000000044613f <runtime.sigNotOnStack+127> 000000c4205b4298: 000000000088a391 0000000000000039 000000c4205b42a8: 000000c4205b4358 00000000004459c6 <runtime.sigtrampgo+790> 000000c4205b42b8: 000000000000000b 000000c4205b4340 000000c4205b42c8: 000000c4205b45c0 ffffffffffffff00 000000c4205b42d8: 0000000000000000 000000c4205b4368 000000c4205b42e8: 0000000000000000 0000000000000000 000000c4205b42f8: 0000000000000000 0000000000000000 000000c4205b4308: 0000000000000000 000000c4203f9680 000000c4205b4318: 0000000000000000 000000c420c65850 000000c4205b4328: 0000000000000000 0000000000000000 000000c4205b4338: 0000000000000000 000000c420388000 000000c4205b4348: 0000000000000000 0000000000008000 000000c4205b4358: 000000c4205b43b0 000000000045e8e3 <runtime.sigtramp+67> 000000c4205b4368: <000000000000000b 000000c4205b46f0 000000c4205b4378: 000000c4205b45c0 000000c4205b45c0 000000c4205b4388: 000000c4205b46f0 0000000000000000 000000c4205b4398: 000000000000000b 000000c4205b43b0 000000c4205b43a8: 00007f9bd86efa60 00007f9bbc0008c0 000000c4205b43b8: !00007f9bd7e20307 >000000c4205b4490 000000c4205b43c8: 000000c4205b43d0 0000000000000020 000000c4205b43d8: 000000c4205b4401 0000000000000020 000000c4205b43e8: 0000000000000000 000000c42105f000 000000c4205b43f8: 000000c42105f000 000000000042bdfc <runtime.mProf_Malloc+396> 000000c4205b4408: 000000c4205b4558 00007f9bd90212b0 000000c4205b4418: 000000c4205b4450 0000000000000008 000000c4205b4428: 0000000000000020 000000c4206b7d01 000000c4205b4438: 00007f9bd9021240 00000eb420b82cb0 000000c4205b4448: 0000000000000008 00000000004e7566 <encoding/binary.Read+118> 000000c4205b4458: 0000000000000000 0000000000000000 000000c4205b4468: 0000000000000000 0000000000000018 000000c4205b4478: 0000000000000012 0000000000000100 000000c4205b4488: 00007f9bd7e210ea 0000000000000000 000000c4205b4498: 0000000000000000 0000000000000000 000000c4205b44a8: 0000000000000000 0000000000000000 000000c4205b44b8: 000000c4205b4518 runtime.sigtramp(0xc4205b4490, 0xc4205b43d0, 0x20, 0xc4205b4401, 0x20, 0x0, 0xc42105f000, 0xc42105f000, 0x42bdfc, 0xc4205b4558, ...) /usr/lib/go-1.10/src/runtime/sys_linux_amd64.s:352 +0x43

goroutine 17 [syscall, locked to thread]: runtime.goexit() /usr/lib/go-1.10/src/runtime/asm_amd64.s:2361 +0x1 fp=0xc42004afe8 sp=0xc42004afe0 pc=0x45d5b1

goroutine 1 [IO wait, 118 minutes]: internal/poll.runtime_pollWait(0x7f9bd905ff00, 0x72, 0x0) /usr/lib/go-1.10/src/runtime/netpoll.go:173 +0x57 internal/poll.(pollDesc).wait(0xc420126198, 0x72, 0xc420066000, 0x0, 0x0) /usr/lib/go-1.10/src/internal/poll/fd_poll_runtime.go:85 +0x9b internal/poll.(pollDesc).waitRead(0xc420126198, 0xffffffffffffff00, 0x0, 0x0) /usr/lib/go-1.10/src/internal/poll/fd_poll_runtime.go:90 +0x3d internal/poll.(FD).Accept(0xc420126180, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0) /usr/lib/go-1.10/src/internal/poll/fd_unix.go:372 +0x1a8 net.(netFD).accept(0xc420126180, 0xc42045a080, 0xc42005bce8, 0x408688) /usr/lib/go-1.10/src/net/fd_unix.go:238 +0x42 net.(TCPListener).accept(0xc42000e7a8, 0xc42005bd18, 0x4075e7, 0xc42045a080) /usr/lib/go-1.10/src/net/tcpsock_posix.go:136 +0x2e net.(TCPListener).AcceptTCP(0xc42000e7a8, 0xc42005bd60, 0xc42005bd68, 0x18) /usr/lib/go-1.10/src/net/tcpsock.go:246 +0x49 net/http.tcpKeepAliveListener.Accept(0xc42000e7a8, 0x891808, 0xc42045a000, 0x8c7ea0, 0xc42022dc20) /usr/lib/go-1.10/src/net/http/server.go:3219 +0x2f net/http.(Server).Serve(0xc420085c70, 0x8c7b60, 0xc42000e7a8, 0x0, 0x0) /usr/lib/go-1.10/src/net/http/server.go:2773 +0x1a5 net/http.(Server).ListenAndServe(0xc420085c70, 0xc420085c70, 0x5) /usr/lib/go-1.10/src/net/http/server.go:2714 +0xa9 net/http.ListenAndServe(0xc4201e7b30, 0x5, 0x0, 0x0, 0x4, 0xc4201e7b30) /usr/lib/go-1.10/src/net/http/server.go:2972 +0x7a main.main() /go/src/github.com/ibm-messaging/mq-metric-samples/cmd/mq_prometheus/main.go:120 +0x408

goroutine 16 [select]: github.com/ibm-messaging/mq-metric-samples/vendor/github.com/prometheus/client_golang/prometheus.(Registry).Gather(0xc420088460, 0x0, 0x0, 0x0, 0x0, 0x0) /go/src/github.com/ibm-messaging/mq-metric-samples/vendor/github.com/prometheus/client_golang/prometheus/registry.go:501 +0xbba github.com/ibm-messaging/mq-metric-samples/vendor/github.com/prometheus/client_golang/prometheus.UninstrumentedHandler.func1(0x8c71e0, 0xc4203f6008, 0xc42009cc00) /go/src/github.com/ibm-messaging/mq-metric-samples/vendor/github.com/prometheus/client_golang/prometheus/http.go:63 +0x47 net/http.HandlerFunc.ServeHTTP(0x891558, 0x8c71e0, 0xc4203f6008, 0xc42009cc00) /usr/lib/go-1.10/src/net/http/server.go:1947 +0x44 net/http.(Handler).ServeHTTP-fm(0x8c71e0, 0xc4203f6008, 0xc42009cc00) /go/src/github.com/ibm-messaging/mq-metric-samples/vendor/github.com/prometheus/client_golang/prometheus/http.go:132 +0x4d github.com/ibm-messaging/mq-metric-samples/vendor/github.com/prometheus/client_golang/prometheus.InstrumentHandlerFuncWithOpts.func1(0x8c77a0, 0xc42017e000, 0xc42009cc00) /go/src/github.com/ibm-messaging/mq-metric-samples/vendor/github.com/prometheus/client_golang/prometheus/http.go:259 +0x22f net/http.HandlerFunc.ServeHTTP(0xc420089220, 0x8c77a0, 0xc42017e000, 0xc42009cc00) /usr/lib/go-1.10/src/net/http/server.go:1947 +0x44 net/http.(ServeMux).ServeHTTP(0xb14760, 0x8c77a0, 0xc42017e000, 0xc42009cc00) /usr/lib/go-1.10/src/net/http/server.go:2340 +0x130 net/http.serverHandler.ServeHTTP(0xc420085c70, 0x8c77a0, 0xc42017e000, 0xc42009cc00) /usr/lib/go-1.10/src/net/http/server.go:2697 +0xbc net/http.(conn).serve(0xc420010280, 0x8c7de0, 0xc420066140) /usr/lib/go-1.10/src/net/http/server.go:1830 +0x651 created by net/http.(Server).Serve /usr/lib/go-1.10/src/net/http/server.go:2798 +0x27b

goroutine 41 [IO wait]: internal/poll.runtime_pollWait(0x7f9bd905fbc0, 0x72, 0xc420059868) /usr/lib/go-1.10/src/runtime/netpoll.go:173 +0x57 internal/poll.(pollDesc).wait(0xc4203f2318, 0x72, 0xffffffffffffff00, 0x8c4a40, 0xae6160) /usr/lib/go-1.10/src/internal/poll/fd_poll_runtime.go:85 +0x9b internal/poll.(pollDesc).waitRead(0xc4203f2318, 0xc4200c8000, 0x1000, 0x1000) /usr/lib/go-1.10/src/internal/poll/fd_poll_runtime.go:90 +0x3d internal/poll.(FD).Read(0xc4203f2300, 0xc4200c8000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/lib/go-1.10/src/internal/poll/fd_unix.go:157 +0x17d net.(netFD).Read(0xc4203f2300, 0xc4200c8000, 0x1000, 0x1000, 0xc420059978, 0x7025ca, 0xc4203c18f8) /usr/lib/go-1.10/src/net/fd_unix.go:202 +0x4f net.(conn).Read(0xc4203f6218, 0xc4200c8000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/lib/go-1.10/src/net/net.go:176 +0x6a net/http.(connReader).Read(0xc4203c18f0, 0xc4200c8000, 0x1000, 0x1000, 0x1, 0xc420059a58, 0x3) /usr/lib/go-1.10/src/net/http/server.go:764 +0xf8 bufio.(Reader).fill(0xc42032a000) /usr/lib/go-1.10/src/bufio/bufio.go:100 +0x11e bufio.(Reader).ReadSlice(0xc42032a000, 0xc4205be40a, 0xc420059ab0, 0x425658, 0xc420029770, 0x7f9bd8ff81b0, 0x7f9bd90bb601) /usr/lib/go-1.10/src/bufio/bufio.go:341 +0x2c bufio.(Reader).ReadLine(0xc42032a000, 0xc4205be500, 0x100, 0xf8, 0x868dc0, 0xc42073e000, 0xc42032a000) /usr/lib/go-1.10/src/bufio/bufio.go:370 +0x34 net/textproto.(Reader).readLineSlice(0xc42073e000, 0x57, 0xc420059ba8, 0x417d08, 0x100, 0x868dc0) /usr/lib/go-1.10/src/net/textproto/reader.go:55 +0x70 net/textproto.(Reader).ReadLine(0xc42073e000, 0xc4205be500, 0x0, 0x0, 0x72) /usr/lib/go-1.10/src/net/textproto/reader.go:36 +0x2b net/http.readRequest(0xc42032a000, 0x0, 0xc4205be500, 0x0, 0x0) /usr/lib/go-1.10/src/net/http/request.go:929 +0x8c net/http.(conn).readRequest(0xc4200103c0, 0x8c7de0, 0xc42054c200, 0x0, 0x0, 0x0) /usr/lib/go-1.10/src/net/http/server.go:944 +0x16f net/http.(conn).serve(0xc4200103c0, 0x8c7de0, 0xc42054c200) /usr/lib/go-1.10/src/net/http/server.go:1768 +0x4dc created by net/http.(Server).Serve /usr/lib/go-1.10/src/net/http/server.go:2798 +0x27b

goroutine 24124 [running]: goroutine running on other thread; stack unavailable created by github.com/ibm-messaging/mq-metric-samples/vendor/github.com/prometheus/client_golang/prometheus.(*Registry).Gather /go/src/github.com/ibm-messaging/mq-metric-samples/vendor/github.com/prometheus/client_golang/prometheus/registry.go:526 +0xe90

goroutine 24134 [IO wait]: internal/poll.runtime_pollWait(0x7f9bd905fc90, 0x72, 0xc4203fde58) /usr/lib/go-1.10/src/runtime/netpoll.go:173 +0x57 internal/poll.(pollDesc).wait(0xc4203f2098, 0x72, 0xffffffffffffff00, 0x8c4a40, 0xae6160) /usr/lib/go-1.10/src/internal/poll/fd_poll_runtime.go:85 +0x9b internal/poll.(pollDesc).waitRead(0xc4203f2098, 0xc4203c0200, 0x1, 0x1) /usr/lib/go-1.10/src/internal/poll/fd_poll_runtime.go:90 +0x3d internal/poll.(FD).Read(0xc4203f2080, 0xc4203c02e1, 0x1, 0x1, 0x0, 0x0, 0x0) /usr/lib/go-1.10/src/internal/poll/fd_unix.go:157 +0x17d net.(netFD).Read(0xc4203f2080, 0xc4203c02e1, 0x1, 0x1, 0xc4205182a0, 0x0, 0xc42054c298) /usr/lib/go-1.10/src/net/fd_unix.go:202 +0x4f net.(conn).Read(0xc42000e030, 0xc4203c02e1, 0x1, 0x1, 0x0, 0x0, 0x0) /usr/lib/go-1.10/src/net/net.go:176 +0x6a net/http.(connReader).backgroundRead(0xc4203c02d0) /usr/lib/go-1.10/src/net/http/server.go:668 +0x5a created by net/http.(*connReader).startBackgroundRead /usr/lib/go-1.10/src/net/http/server.go:664 +0xce

goroutine 24137 [semacquire]: sync.runtime_Semacquire(0xc42125a08c) /usr/lib/go-1.10/src/runtime/sema.go:56 +0x39 sync.(WaitGroup).Wait(0xc42125a080) /usr/lib/go-1.10/src/sync/waitgroup.go:129 +0x72 github.com/ibm-messaging/mq-metric-samples/vendor/github.com/prometheus/client_golang/prometheus.(Registry).Gather.func2(0xc42125a080, 0xc420344000, 0xc420344060) /go/src/github.com/ibm-messaging/mq-metric-samples/vendor/github.com/prometheus/client_golang/prometheus/registry.go:451 +0x2b created by github.com/ibm-messaging/mq-metric-samples/vendor/github.com/prometheus/client_golang/prometheus.(*Registry).Gather /go/src/github.com/ibm-messaging/mq-metric-samples/vendor/github.com/prometheus/client_golang/prometheus/registry.go:450 +0x5d8

ibmmqmet commented 5 years ago

There's nothing obvious in there, and it's not something I've seen before, but there might be a way to get better stack information.

Setting an environment variable disables some of MQ's signal handling:

export MQS_NO_SYNC_SIGNAL_HANDLING=true

Put that somewhere in the monitor's startup script so it gets in the program's environment. It should at least stop the message about the SA_ONSTACK flag. And hopefully give a more useful dump if it crashes again.

salapat11 commented 5 years ago

Thank you. Will set the variable and try to re produce.

salapat11 commented 5 years ago

The above flag was set in the Prometheus script and below is the dump when the queue manager was under load. There were 1200 channel connections at the time this happened.

IBM MQ metrics exporter for Prometheus monitoring Build : 20190607-144301 Commit Level : 9c2ecb3

2019/06/18 09:07:41 http: multiple response.WriteHeader calls panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x51fe1a]

goroutine 1623 [running]: github.com/ibm-messaging/mq-metric-samples/vendor/github.com/ibm-messaging/mq-golang/mqmetric.parseChlData(0x3f3, 0x0, 0xc42147a800, 0x0, 0x2800, 0x8c39c0, 0xc42 0e982a0) /go/src/github.com/ibm-messaging/mq-metric-samples/vendor/github.com/ibm-messaging/mq-golang/mqmetric/channel.go:255 +0x3a github.com/ibm-messaging/mq-metric-samples/vendor/github.com/ibm-messaging/mq-golang/mqmetric.collectChannelStatus(0x7ffd81951351, 0x1, 0x7ffd000003f3, 0x1, 0xc4 20d20670) /go/src/github.com/ibm-messaging/mq-metric-samples/vendor/github.com/ibm-messaging/mq-golang/mqmetric/channel.go:226 +0x3eb github.com/ibm-messaging/mq-metric-samples/vendor/github.com/ibm-messaging/mq-golang/mqmetric.CollectChannelStatus(0x7ffd81951351, 0x1, 0x87649d, 0x4) /go/src/github.com/ibm-messaging/mq-metric-samples/vendor/github.com/ibm-messaging/mq-golang/mqmetric/channel.go:154 +0x231 main.(exporter).Collect(0xc42037bc20, 0xc420e2c060) /go/src/github.com/ibm-messaging/mq-metric-samples/cmd/mq_prometheus/exporter.go:174 +0x39e3 github.com/ibm-messaging/mq-metric-samples/vendor/github.com/prometheus/client_golang/prometheus.(Registry).Gather.func1() /go/src/github.com/ibm-messaging/mq-metric-samples/vendor/github.com/prometheus/client_golang/prometheus/registry.go:434 +0x199 created by github.com/ibm-messaging/mq-metric-samples/vendor/github.com/prometheus/client_golang/prometheus.(*Registry).Gather /go/src/github.com/ibm-messaging/mq-metric-samples/vendor/github.com/prometheus/client_golang/prometheus/registry.go:445 +0x599

ibmmqmet commented 5 years ago

v4.0.7 has more error checking that I hope will stop this happening.

salapat11 commented 5 years ago

Thank you. I will give a try.