While running a performance benchmark with 10 topics, 10 producers and 10 consumers, one of the pulsar client library goroutine crashes right after a broker initiated disconnection
Expected behavior
The library code should handle properly a disconnection by the broker.
Actual behavior
one of the goroutine runs into a nil pointer dereference which causes SIGSEGV:
INFO[2021-09-28 14:49:01.603058] Broker notification of Closed producer: 7 local_addr="10.65.75.144:49172" remote_addr="pulsar://10.65.74.234:6650"
WARN[2021-09-28 14:49:01.603185] [Connection was closed] cnx="10.65.75.144:49172 -> 10.65.74.234:6650" producerID=7 producer_name=c1-sn-platform-11-168 topic="persistent://benchmark/ns-cluster/test-topic-6"
INFO[2021-09-28 14:49:01.603234] [Reconnecting to broker in 106.038899ms] producerID=7 producer_name=c1-sn-platform-11-168 topic="persistent://benchmark/ns-cluster/test-topic-6"
WARN[2021-09-28 14:49:01.603936] Received send error from server: [UnknownError] : [] local_addr="10.65.75.144:49172" remote_addr="pulsar://10.65.74.234:6650"
WARN[2021-09-28 14:49:01.604087] [Connection was closed] cnx="10.65.75.144:49172 -> 10.65.74.234:6650" producerID=3 producer_name=c1-sn-platform-11-166 topic="persistent://benchmark/ns-cluster/test-topic-2"
WARN[2021-09-28 14:49:01.604112] [Connection was closed] cnx="10.65.75.144:49172 -> 10.65.74.234:6650" producerID=4 producer_name=c1-sn-platform-11-167 topic="persistent://benchmark/ns-cluster/test-topic-3"
WARN[2021-09-28 14:49:01.604132] [Connection was closed] cnx="10.65.75.144:49172 -> 10.65.74.234:6650" producerID=8 producer_name=c1-sn-platform-11-169 topic="persistent://benchmark/ns-cluster/test-topic-7"
INFO[2021-09-28 14:49:01.604200] [Reconnecting to broker in 119.867378ms] consumerID=7 name=cqdxu subscription=benchmark/ns-cluster/test-topic-6 topic="persistent://benchmark/ns-cluster/test-topic-6"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xac4e42]
goroutine 26 [running]:
github.com/apache/pulsar-client-go/pulsar/internal.(*connection).handleSendError(0xc0003a0160, 0x0)
/root/go/pkg/mod/github.com/apache/pulsar-client-go@v0.6.0/pulsar/internal/connection.go:755 +0x182
github.com/apache/pulsar-client-go/pulsar/internal.(*connection).internalReceivedCommand(0xc0003a0160, 0xc0021288c0, 0x0, 0x0)
/root/go/pkg/mod/github.com/apache/pulsar-client-go@v0.6.0/pulsar/internal/connection.go:539 +0x1f2
github.com/apache/pulsar-client-go/pulsar/internal.(*connection).run(0xc0003a0160)
/root/go/pkg/mod/github.com/apache/pulsar-client-go@v0.6.0/pulsar/internal/connection.go:408 +0x3c5
github.com/apache/pulsar-client-go/pulsar/internal.(*connection).start.func1(0xc0003a0160)
/root/go/pkg/mod/github.com/apache/pulsar-client-go@v0.6.0/pulsar/internal/connection.go:227 +0x85
created by github.com/apache/pulsar-client-go/pulsar/internal.(*connection).start
/root/go/pkg/mod/github.com/apache/pulsar-client-go@v0.6.0/pulsar/internal/connection.go:223 +0x56
Steps to reproduce
I have a go app that uses the pulsar go client library to run multiple producers/consumers in different goroutines (all in same process).
Works fine most of the runs but occasionally crashes when the broker decides to disconnect, must be some race condition between a disconnection event and internalReceiveCommand()/handleSendError().
When using the other perf tool pulsar-perf, we get the following error message:
03:36:17.548 [pulsar-client-io-2-1] WARN org.apache.pulsar.client.impl.ClientCnx - [id: 0xb0d1d04e, L:/10.42.0.8:38398 - R:10.65.74.234/10.65.74.234:6650] Received send error from server: PersistenceError : org.apache.bookkeeper.mledger.ManagedLedgerException: Not enough non-faulty bookies available
Perhaps the same error condition is causing a crash in the go client library...
Original Issue: apache/pulsar-client-go#629
While running a performance benchmark with 10 topics, 10 producers and 10 consumers, one of the pulsar client library goroutine crashes right after a broker initiated disconnection
Expected behavior
The library code should handle properly a disconnection by the broker.
Actual behavior
one of the goroutine runs into a nil pointer dereference which causes SIGSEGV:
Steps to reproduce
I have a go app that uses the pulsar go client library to run multiple producers/consumers in different goroutines (all in same process). Works fine most of the runs but occasionally crashes when the broker decides to disconnect, must be some race condition between a disconnection event and internalReceiveCommand()/handleSendError().
When using the other perf tool pulsar-perf, we get the following error message: 03:36:17.548 [pulsar-client-io-2-1] WARN org.apache.pulsar.client.impl.ClientCnx - [id: 0xb0d1d04e, L:/10.42.0.8:38398 - R:10.65.74.234/10.65.74.234:6650] Received send error from server: PersistenceError : org.apache.bookkeeper.mledger.ManagedLedgerException: Not enough non-faulty bookies available
Perhaps the same error condition is causing a crash in the go client library...
System configuration
Pulsar version: 2.8.0 pulsar-client-go: 0.6.0