streamnative / pulsar-archived

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org
Apache License 2.0
73 stars 25 forks source link

ISSUE-13523: kill the broker or ack takes a long time, which will cause the cluster traffic to drop instantaneously #3488

Open sijie opened 2 years ago

sijie commented 2 years ago

Original Issue: apache/pulsar#13523


Is your enhancement request related to a problem? Please describe. I have nine brokers,when I kill one, the traffic of the remaining eight brokers will drop within 30 seconds, reach a peak after 30 seconds, and then return to normal. I found that the sending thread is blocked:

    private boolean canEnqueueRequest(SendCallback callback, long sequenceId, int payloadSize) {
        try {
            if (conf.isBlockIfQueueFull()) {
                if (semaphore.isPresent()) {
                    semaphore.get().acquire();
                }
..

In other words, if the Producer queue corresponding to some partitions is full, the sending thread will be blocked, affecting the sending of other partitions

Describe the solution you'd like Isolate partitions to avoid mutual influence when sending exceptions

Describe alternatives you've considered PartitionedProducerImpl can automatically exclude Producers whose queues are full

github-actions[bot] commented 2 years ago

The issue had no activity for 30 days, mark with Stale label.