raintank / worldping-api

Worldping Backend Service
Other
25 stars 18 forks source link

alerting job messages not being acked to rabbitmq? #27

Closed woodsaj closed 8 years ago

woodsaj commented 8 years ago

Issue by Dieterbe Tuesday Oct 20, 2015 at 20:19 GMT Originally opened as https://github.com/raintank/grafana/issues/496


how to reproduce: 1) update & spin up dev stack 2) env-load load 3) open http://localhost:15672 -> both in main overview of queued messages, as well as individual queues the number of unacked = total messages. although the exchanges do say msg in rate = msg out rate, and alerting seems to work

woodsaj commented 8 years ago

Comment by Dieterbe Tuesday Oct 20, 2015 at 20:26 GMT


huh (this is from the alerting dashboard in dev stack)

actually it seems like, even though the rabbit queue is being fed constantly, the workers aren't really pulling many jobs out of it. (PS: it would be nice to have the queue graphs from rabbit also available in graphite, we're not using the internal jobqueue here, we're using rabbit) also this is with a lightly loaded system, no process doing more than 50% cpu.

woodsaj commented 8 years ago

Comment by Dieterbe Tuesday Oct 20, 2015 at 20:31 GMT


@woodsaj @ctdk any idea?

woodsaj commented 8 years ago

Comment by ctdk Tuesday Oct 20, 2015 at 20:45 GMT


@Dieterbe No ideas at the moment, but I'm in there anyway so I'll take a look.

woodsaj commented 8 years ago

Comment by Dieterbe Tuesday Oct 20, 2015 at 22:50 GMT


https://github.com/raintank/raintank-docker/issues/41 might have something to do with this, though if that was only it, at least some messages should be acked to rabbit.

woodsaj commented 8 years ago

Comment by ctdk Tuesday Oct 20, 2015 at 23:47 GMT


There aren't any un-acked messages in a litmus stack brought up with chef and not using env-load, fwiw.

woodsaj commented 8 years ago

Comment by woodsaj Thursday Oct 22, 2015 at 14:05 GMT


looks like the problem could be due to https://github.com/raintank/grafana/blob/master/pkg/services/rabbitmq/consumer.go#L146

Msgs are being processes sequentially. We probably need to refactor the code so that each message is processed in its own goroutine.