streamnative / pulsar-archived

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org
Apache License 2.0
72 stars 25 forks source link

ISSUE-12844: [pulsar-client-cpp] Excessive locking cause significant performance degradation #3283

Open sijie opened 2 years ago

sijie commented 2 years ago

Original Issue: apache/pulsar-client-cpp#116


Describe the bug Implementation of statistics in cpp client have two concurrency issues.

  1. ProducerStatsImpl (and ConsumerStatsImpl) classes use a single shared lock to protect access to internal data. The lock is taken on each sent or received message. Under high load this shared lock causes signficant contention and performance degradation. Profiler shows that sending and receiving threads block each-other.

original-profiling

Since sending and receving functions access different member subset they should be protected by different mutex or other approach should be selected. As example after patching issue I've got about 1/3 throughtput improvement. As you can see on screenshot below threads are witing on I/O but not on mutexes. pathed-profiling

  1. ProducerStatsImpl implementation has races between destructor and DeadlineTimer callback. Consider following scenario:

    1. ProducerStatsImpl destructor acquire the mutex
    2. DeadlineTimer calls calback flushAndReset and blocked on mutex
    3. ProducerStatsImpl calls timer.cancel and cancel any pending operation but it cannot cancel already executed callback at step 2
    4. ProducerStatsImpl destructor release mutex
    5. DeadlineTimer acquire the mutex
    6. ProducerStatsImpl destructor destroy object
    7. DeadlineTimer callback access to deallocated memory

Are you willing accept PR for issue number one or both?

github-actions[bot] commented 2 years ago

The issue had no activity for 30 days, mark with Stale label.