streadway / amqp

Go client for AMQP 0.9.1
http://godoc.org/github.com/streadway/amqp
BSD 2-Clause "Simplified" License
4.87k stars 621 forks source link

Growing RAM usage in recvContent #272

Closed Dieterbe closed 7 years ago

Dieterbe commented 7 years ago

Hello amqp maintainers, I run an open source project (https://github.com/graphite-ng/carbon-relay-ng) that uses this library, and one of my users is running into a memory leak problem:

go tool pprof carbon-relay-ng carbon-relay-ng.heap
Entering interactive mode (type "help" for commands)
(pprof) top30
77518.24MB of 78140.35MB total (99.20%)
Dropped 123 nodes (cum <= 390.70MB)
      flat  flat%   sum%        cum   cum%
76608.96MB 98.04% 98.04% 77518.24MB 99.20%  github.com/graphite-ng/carbon-relay-ng/vendor/github.com/streadway/amqp.(*Channel).recvContent
  909.28MB  1.16% 99.20%   909.28MB  1.16%  github.com/graphite-ng/carbon-relay-ng/vendor/github.com/streadway/amqp.newDelivery
         0     0% 99.20%   909.28MB  1.16%  github.com/graphite-ng/carbon-relay-ng/vendor/github.com/streadway/amqp.(*Channel).dispatch
         0     0% 99.20% 77518.24MB 99.20%  github.com/graphite-ng/carbon-relay-ng/vendor/github.com/streadway/amqp.(*Connection).demux
         0     0% 99.20% 77518.24MB 99.20%  github.com/graphite-ng/carbon-relay-ng/vendor/github.com/streadway/amqp.(*Connection).dispatchN
         0     0% 99.20% 77687.28MB 99.42%  github.com/graphite-ng/carbon-relay-ng/vendor/github.com/streadway/amqp.(*Connection).reader
         0     0% 99.20% 78139.85MB   100%  runtime.goexit
(pprof) top30 -cum
77518.24MB of 78140.35MB total (99.20%)
Dropped 123 nodes (cum <= 390.70MB)
      flat  flat%   sum%        cum   cum%
         0     0%     0% 78139.85MB   100%  runtime.goexit
         0     0%     0% 77687.28MB 99.42%  github.com/graphite-ng/carbon-relay-ng/vendor/github.com/streadway/amqp.(*Connection).reader
76608.96MB 98.04% 98.04% 77518.24MB 99.20%  github.com/graphite-ng/carbon-relay-ng/vendor/github.com/streadway/amqp.(*Channel).recvContent
         0     0% 98.04% 77518.24MB 99.20%  github.com/graphite-ng/carbon-relay-ng/vendor/github.com/streadway/amqp.(*Connection).demux
         0     0% 98.04% 77518.24MB 99.20%  github.com/graphite-ng/carbon-relay-ng/vendor/github.com/streadway/amqp.(*Connection).dispatchN
         0     0% 98.04%   909.28MB  1.16%  github.com/graphite-ng/carbon-relay-ng/vendor/github.com/streadway/amqp.(*Channel).dispatch
  909.28MB  1.16% 99.20%   909.28MB  1.16%  github.com/graphite-ng/carbon-relay-ng/vendor/github.com/streadway/amqp.newDelivery
(pprof) list amqp.*Channel.*recvContent
Total: 76.31GB
ROUTINE ======================== github.com/graphite-ng/carbon-relay-ng/vendor/github.com/streadway/amqp.(*Channel).recvContent in /home/ubuntu/go/src/github.com/graphite-ng/carbon-relay-ng/vendor/github.com/streadway/amqp/channel.go
   74.81GB    75.70GB (flat, cum) 99.20% of Total
         .          .    365:   case *headerFrame:
         .          .    366:       // drop and reset
         .          .    367:       return ch.transition((*Channel).recvMethod)
         .          .    368:
         .          .    369:   case *bodyFrame:
   74.81GB    74.81GB    370:       ch.body = append(ch.body, frame.Body...)
         .          .    371:
         .          .    372:       if uint64(len(ch.body)) >= ch.header.Size {
         .          .    373:           ch.message.setContent(ch.header.Properties, ch.body)
         .          .    374:           ch.dispatch(ch.message) // termination state
         .   909.28MB    375:           return ch.transition((*Channel).recvMethod)
         .          .    376:       }
         .          .    377:
         .          .    378:       return ch.transition((*Channel).recvContent)
         .          .    379:   }
         .          .    380:
(pprof) 

the user also reported:

I can correlate the memory growth with inbound bandwith, at ~60Mbps its ok - when it goes path ~65Mbps it starts to grow, at ~70Mbps it grows like crazy... top says 136% cpu used (box has 20 cores)

our amqp consumption routine is single threaded. (see https://github.com/graphite-ng/carbon-relay-ng/blob/fa4a07caa9a9f85463441cfb66d615bc475a320b/cmd/carbon-relay-ng/carbon-relay-ng.go#L193 and https://github.com/graphite-ng/carbon-relay-ng/blob/fa4a07caa9a9f85463441cfb66d615bc475a320b/input/amqp.go#L33 ) so my hypothesis is this: as long as we can handle the ingest volume, no problem. but once the volume is too high, the amqp consumption goes up to 100% cpu and maxes out a single cpu, and the amqp library consumes faster than we can, and that allocates all that memory. and the 36% cpu is just other stuff the relay does.

note this was produced with amqp library d75c3a341ff43309ad0cb69ac8bdbd1d8772775f dated 2017-02-3T13:40:38Z, I just updated the carbon-relay-ng project to use the latest amqp master (https://github.com/graphite-ng/carbon-relay-ng/commit/3e876644819001f0ad21ac0bc427a5eff9cb7332) , would be nicer to reproduce with that, though i didn't see any changes obviously related to this code.

now, the route in the relay was configured like so:

[amqp]
amqp_enabled = true
amqp_host = "host"
amqp_port = 5672
amqp_user = "foo"
amqp_password = "bar"
amqp_vhost = "/graphite"
amqp_exchange = "aggr_exchange"
amqp_queue = "metrictank002-carbon-relay-ng"
amqp_key = "#"
amqp_durable = false
amqp_exclusive = true

as for how that translates into the options passed in to the amqp library, it's probably easiest/clearest if i just link you to our code at https://github.com/graphite-ng/carbon-relay-ng/blob/fa4a07caa9a9f85463441cfb66d615bc475a320b/input/amqp.go (StartAMQP and consumeAMQP functions)

PS this issue looks somewhat similar to #149 and #264 but i'm not sure.

personally i know little of rabbitmq/amqp (in fact this functionality was contributed by others but I'm trying to support it) so i'm not even sure if we're just misusing the library or not, but to you it might be obvious..

thanks!

michaelklishin commented 7 years ago

Your code seems to rely on the automatic acknowledgement mode and no prefetch (see tutorial 2. Therefore RabbitMQ will deliver to it as fast as possible and there is no way to for consumer to throttle the rate.

Have you considered using manual acknowledgement and a bounded prefetch, say, 200 or 300?

nvx commented 7 years ago

Unless you are cancelling consumers or closing channels, #264 is unrelated. That issue only leaks memory when there are in flight messages (the higher the prefetch the bigger the effect) when a consumer is cancelled and you don't explicitly drain the consumer delivery chan.

Dieterbe commented 7 years ago

thanks for your help, if it looks like the causes of the problem are all on our side (seems to be the case), feel free to close this ticket