streamnative / pulsar-client-go

Apache Pulsar Go Client Library
https://pulsar.apache.org/
Apache License 2.0
1 stars 2 forks source link

ISSUE-629: goroutine crashes on broker reconnection #228

Closed sijie closed 2 years ago

sijie commented 2 years ago

Original Issue: apache/pulsar-client-go#629


While running a performance benchmark with 10 topics, 10 producers and 10 consumers, one of the pulsar client library goroutine crashes right after a broker initiated disconnection

Expected behavior

The library code should handle properly a disconnection by the broker.

Actual behavior

one of the goroutine runs into a nil pointer dereference which causes SIGSEGV:

INFO[2021-09-28 14:49:01.603058] Broker notification of Closed producer: 7     local_addr="10.65.75.144:49172" remote_addr="pulsar://10.65.74.234:6650"
WARN[2021-09-28 14:49:01.603185] [Connection was closed]                       cnx="10.65.75.144:49172 -> 10.65.74.234:6650" producerID=7 producer_name=c1-sn-platform-11-168 topic="persistent://benchmark/ns-cluster/test-topic-6"
INFO[2021-09-28 14:49:01.603234] [Reconnecting to broker in  106.038899ms]     producerID=7 producer_name=c1-sn-platform-11-168 topic="persistent://benchmark/ns-cluster/test-topic-6"
WARN[2021-09-28 14:49:01.603936] Received send error from server: [UnknownError] : []  local_addr="10.65.75.144:49172" remote_addr="pulsar://10.65.74.234:6650"
WARN[2021-09-28 14:49:01.604087] [Connection was closed]                       cnx="10.65.75.144:49172 -> 10.65.74.234:6650" producerID=3 producer_name=c1-sn-platform-11-166 topic="persistent://benchmark/ns-cluster/test-topic-2"
WARN[2021-09-28 14:49:01.604112] [Connection was closed]                       cnx="10.65.75.144:49172 -> 10.65.74.234:6650" producerID=4 producer_name=c1-sn-platform-11-167 topic="persistent://benchmark/ns-cluster/test-topic-3"
WARN[2021-09-28 14:49:01.604132] [Connection was closed]                       cnx="10.65.75.144:49172 -> 10.65.74.234:6650" producerID=8 producer_name=c1-sn-platform-11-169 topic="persistent://benchmark/ns-cluster/test-topic-7"
INFO[2021-09-28 14:49:01.604200] [Reconnecting to broker in  119.867378ms]     consumerID=7 name=cqdxu subscription=benchmark/ns-cluster/test-topic-6 topic="persistent://benchmark/ns-cluster/test-topic-6"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xac4e42]

goroutine 26 [running]:
github.com/apache/pulsar-client-go/pulsar/internal.(*connection).handleSendError(0xc0003a0160, 0x0)
    /root/go/pkg/mod/github.com/apache/pulsar-client-go@v0.6.0/pulsar/internal/connection.go:755 +0x182
github.com/apache/pulsar-client-go/pulsar/internal.(*connection).internalReceivedCommand(0xc0003a0160, 0xc0021288c0, 0x0, 0x0)
    /root/go/pkg/mod/github.com/apache/pulsar-client-go@v0.6.0/pulsar/internal/connection.go:539 +0x1f2
github.com/apache/pulsar-client-go/pulsar/internal.(*connection).run(0xc0003a0160)
    /root/go/pkg/mod/github.com/apache/pulsar-client-go@v0.6.0/pulsar/internal/connection.go:408 +0x3c5
github.com/apache/pulsar-client-go/pulsar/internal.(*connection).start.func1(0xc0003a0160)
    /root/go/pkg/mod/github.com/apache/pulsar-client-go@v0.6.0/pulsar/internal/connection.go:227 +0x85
created by github.com/apache/pulsar-client-go/pulsar/internal.(*connection).start
    /root/go/pkg/mod/github.com/apache/pulsar-client-go@v0.6.0/pulsar/internal/connection.go:223 +0x56 

Steps to reproduce

I have a go app that uses the pulsar go client library to run multiple producers/consumers in different goroutines (all in same process). Works fine most of the runs but occasionally crashes when the broker decides to disconnect, must be some race condition between a disconnection event and internalReceiveCommand()/handleSendError().

When using the other perf tool pulsar-perf, we get the following error message: 03:36:17.548 [pulsar-client-io-2-1] WARN org.apache.pulsar.client.impl.ClientCnx - [id: 0xb0d1d04e, L:/10.42.0.8:38398 - R:10.65.74.234/10.65.74.234:6650] Received send error from server: PersistenceError : org.apache.bookkeeper.mledger.ManagedLedgerException: Not enough non-faulty bookies available

Perhaps the same error condition is causing a crash in the go client library...

System configuration

Pulsar version: 2.8.0 pulsar-client-go: 0.6.0