mailgun / kafka-pixy

gRPC/REST proxy for Kafka
Apache License 2.0
773 stars 118 forks source link

Group consumer closed while rebalancing in progress #66

Closed horkhe closed 8 years ago

horkhe commented 8 years ago

Turned out the group consumer can stop leaving a pending rebalancing goroutine behind. That in turn was causing occasional panics like the one below:

May  7 00:45:22.164 INFO  </T[2]/cons[0]/dispatcher[0]> started
May  7 00:45:22.164 INFO  </T[2]/cons[0]/G:g1[0]> started
May  7 00:45:22.164 INFO  </T[2]/cons[0]/G:g1[0]/dispatcher[0]> started
May  7 00:45:22.164 INFO  </T[2]/cons[0]/offset_mgr_f[0]/mapper[0]> started
May  7 00:45:22.164 INFO  </T[2]/cons[0]/G:g1[0]/T:no-such-topic[0]> started
May  7 00:45:22.164 INFO  </T[2]/cons[0]/G:g1[0]/member[0]> started
May  7 00:45:22.164 INFO  </T[2]/cons[0]/G:g1[0]/manager[0]> started
May  7 00:45:22.164 INFO  </T[2]/cons[0]/G:g1[0]/msg_stream_f[0]/mapper[0]> started
May  7 00:45:22.164 INFO  [zk] Connected to 192.168.100.67:2181
May  7 00:45:22.168 INFO  [zk] Authenticated: id=95833896904687742, timeout=15000
May  7 00:45:22.180 INFO  </T[2]/cons[0]/G:g1[0]/member[0]> submitted: topics=[no-such-topic]
May  7 00:45:22.286 INFO  </T[2]/cons[0]/G:g1[0]/member[0]> fetched subscriptions: map[consumer-1:[no-such-topic]]
May  7 00:45:22.286 INFO  </T[2]/cons[0]/G:g1[0]/manager[0]/rebalancer[0]> started
May  7 00:45:22.286 INFO  [sarama] client/metadata fetching metadata for [no-such-topic] from broker 192.168.100.67:9093
May  7 00:45:22.287 INFO  [sarama] client/metadata found some partitions to be leaderless
May  7 00:45:22.287 INFO  [sarama] client/metadata retrying after 250ms... (3 attempts remaining)
May  7 00:45:22.539 INFO  [sarama] client/metadata fetching metadata for [no-such-topic] from broker 192.168.100.67:9093
May  7 00:45:22.540 INFO  [sarama] client/metadata found some partitions to be leaderless
May  7 00:45:22.540 INFO  [sarama] client/metadata retrying after 250ms... (2 attempts remaining)
May  7 00:45:22.794 INFO  [sarama] client/metadata fetching metadata for [no-such-topic] from broker 192.168.100.67:9093
May  7 00:45:22.795 INFO  [sarama] client/metadata found some partitions to be leaderless
May  7 00:45:22.795 INFO  [sarama] client/metadata retrying after 250ms... (1 attempts remaining)
May  7 00:45:23.048 INFO  [sarama] client/metadata fetching metadata for [no-such-topic] from broker 192.168.100.67:9093
May  7 00:45:23.049 INFO  [sarama] client/metadata found some partitions to be leaderless
May  7 00:45:23.049 INFO  </T[2]/cons[0]/G:g1[0]/manager[0]/rebalancer[0]> stopped
May  7 00:45:23.049 ERROR </T[2]/cons[0]/G:g1[0]/manager[0]> rebalance failed: err=(failed to get partition list: topic=no-such-topic, err=(kafka server: Request was for a topic or partition that does not exist on this broker.))
May  7 00:45:23.154 INFO  </T[2]/cons[0]/G:g1[0]/manager[0]/rebalancer[1]> started
May  7 00:45:23.154 INFO  [sarama] client/metadata fetching metadata for [no-such-topic] from broker 192.168.100.67:9093
May  7 00:45:23.155 INFO  [sarama] client/metadata found some partitions to be leaderless
May  7 00:45:23.155 INFO  [sarama] client/metadata retrying after 250ms... (3 attempts remaining)
May  7 00:45:23.165 INFO  </T[2]/cons[0]/G:g1[0]/T:no-such-topic[0]> stopped
May  7 00:45:23.165 INFO  </T[2]/cons[0]/G:g1[0]/dispatcher[0]> child stopped: /T[2]/cons[0]/G:g1[0]/T:no-such-topic[0]
May  7 00:45:23.165 INFO  </T[2]/cons[0]/G:g1[0]/dispatcher[0]> stopped
May  7 00:45:23.181 INFO  </T[2]/cons[0]/G:g1[0]/member[0]> submitted: topics=[]
May  7 00:45:23.184 INFO  </T[2]/cons[0]/G:g1[0]/member[0]> stopped
May  7 00:45:23.184 INFO  </T[2]/cons[0]/G:g1[0]/manager[0]> stopped
May  7 00:45:23.184 INFO  </T[2]/cons[0]/G:g1[0]/msg_stream_f[0]/mapper[0]> stopped
May  7 00:45:23.184 INFO  </T[2]/cons[0]/G:g1[0]> stopped
May  7 00:45:23.184 INFO  </T[2]/cons[0]/dispatcher[0]> child stopped: /T[2]/cons[0]/G:g1[0]
May  7 00:45:23.184 INFO  </T[2]/cons[0]/dispatcher[0]> stopped
May  7 00:45:23.184 INFO  </T[2]/cons[0]/offset_mgr_f[0]/mapper[0]> stopped
May  7 00:45:23.187 INFO  [zk] Recv loop terminated: err=EOF
May  7 00:45:23.187 INFO  [zk] Send loop terminated: err=<nil>
May  7 00:45:23.187 INFO  [sarama] Closing Client
May  7 00:45:23.187 INFO  [sarama] Closing Client
May  7 00:45:23.412 ERROR </T[2]/cons[0]/G:g1[0]/manager[0]/rebalancer[1]> paniced: assignment to entry in nil map, stack=goroutine 176 [running]:
runtime/debug.Stack(0x0, 0x0, 0x0)
    /usr/local/go/src/runtime/debug/stack.go:24 +0x80
github.com/mailgun/kafka-pixy/actor.Spawn.func1.1(0xc82021c7a0)
    /Users/maximvladimirsky/workspace/goworld/src/github.com/mailgun/kafka-pixy/actor/actor.go:63 +0x5b
panic(0x406160, 0xc820760cc0)
    /usr/local/go/src/runtime/panic.go:443 +0x4e9
github.com/mailgun/kafka-pixy/vendor/github.com/Shopify/sarama.(*client).registerBroker(0xc8201ac780, 0xc82071b8f0)
    /Users/maximvladimirsky/workspace/goworld/src/github.com/mailgun/kafka-pixy/vendor/github.com/Shopify/sarama/client.go:382 +0xa9
github.com/mailgun/kafka-pixy/vendor/github.com/Shopify/sarama.(*client).updateMetadata(0xc8201ac780, 0xc82074d920, 0xc800000000, 0x0, 0x0)
    /Users/maximvladimirsky/workspace/goworld/src/github.com/mailgun/kafka-pixy/vendor/github.com/Shopify/sarama/client.go:624 +0x123
github.com/mailgun/kafka-pixy/vendor/github.com/Shopify/sarama.(*client).tryRefreshMetadata(0xc8201ac780, 0xc820550660, 0x1, 0x1, 0x2, 0x0, 0x0)
    /Users/maximvladimirsky/workspace/goworld/src/github.com/mailgun/kafka-pixy/vendor/github.com/Shopify/sarama/client.go:591 +0x33f
github.com/mailgun/kafka-pixy/vendor/github.com/Shopify/sarama.(*client).tryRefreshMetadata.func1(0x974fd8, 0xc820550758, 0x0, 0x0)
    /Users/maximvladimirsky/workspace/goworld/src/github.com/mailgun/kafka-pixy/vendor/github.com/Shopify/sarama/client.go:575 +0x207
github.com/mailgun/kafka-pixy/vendor/github.com/Shopify/sarama.(*client).tryRefreshMetadata(0xc8201ac780, 0xc820550660, 0x1, 0x1, 0x3, 0x0, 0x0)
    /Users/maximvladimirsky/workspace/goworld/src/github.com/mailgun/kafka-pixy/vendor/github.com/Shopify/sarama/client.go:593 +0x473
github.com/mailgun/kafka-pixy/vendor/github.com/Shopify/sarama.(*client).RefreshMetadata(0xc8201ac780, 0xc820550660, 0x1, 0x1, 0x0, 0x0)
    /Users/maximvladimirsky/workspace/goworld/src/github.com/mailgun/kafka-pixy/vendor/github.com/Shopify/sarama/client.go:316 +0x15c
github.com/mailgun/kafka-pixy/vendor/github.com/Shopify/sarama.(*client).Partitions(0xc8201ac780, 0xc8202181c1, 0xd, 0x0, 0x0, 0x0, 0x0, 0x0)
    /Users/maximvladimirsky/workspace/goworld/src/github.com/mailgun/kafka-pixy/vendor/github.com/Shopify/sarama/client.go:217 +0x176
github.com/mailgun/kafka-pixy/vendor/github.com/Shopify/sarama.(Client).Partitions-fm(0xc8202181c1, 0xd, 0x0, 0x0, 0x0, 0x0, 0x0)
    /Users/maximvladimirsky/workspace/goworld/src/github.com/mailgun/kafka-pixy/consumer/groupcsm/groupcsm.go:66 +0x71
github.com/mailgun/kafka-pixy/consumer/groupcsm.(*T).resolvePartitions(0xc820562820, 0xc82000a9f0, 0x1, 0x0, 0x0)
    /Users/maximvladimirsky/workspace/goworld/src/github.com/mailgun/kafka-pixy/consumer/groupcsm/groupcsm.go:269 +0x5c3
github.com/mailgun/kafka-pixy/consumer/groupcsm.(*T).runRebalancer(0xc820562820, 0xc82021c7a0, 0xc82000ba10, 0xc82000a9f0, 0xc8200f6660)
    /Users/maximvladimirsky/workspace/goworld/src/github.com/mailgun/kafka-pixy/consumer/groupcsm/groupcsm.go:201 +0x40
github.com/mailgun/kafka-pixy/consumer/groupcsm.(*T).runManager.func1()
    /Users/maximvladimirsky/workspace/goworld/src/github.com/mailgun/kafka-pixy/consumer/groupcsm/groupcsm.go:181 +0x44
github.com/mailgun/kafka-pixy/actor.Spawn.func1(0x0, 0xc82021c7a0, 0xc82000ba70)
    /Users/maximvladimirsky/workspace/goworld/src/github.com/mailgun/kafka-pixy/actor/actor.go:68 +0x112
created by github.com/mailgun/kafka-pixy/actor.Spawn
    /Users/maximvladimirsky/workspace/goworld/src/github.com/mailgun/kafka-pixy/actor/actor.go:69 +0x67