motiwari / BanditPAM

BanditPAM C++ implementation and Python package
MIT License
647 stars 38 forks source link

Unpredictable Performance with Parallelization Enabled on ScRNA Dataset #264

Open lukeleeai opened 1 year ago

lukeleeai commented 1 year ago

I'm experiencing an issue when running computations with parallelization turned on for the ScRNA dataset. The performance becomes quite unpredictable, which is not the expected behaviour.

In particular, when we look at the sample complexity graph (attached), there is a significant dip around 40,000 data points. Given the nature of the algorithm, we would typically expect a smooth, monotonically increasing curve instead of a dip.

This issue only seems to occur when parallelization is enabled and specifically with the ScRNA dataset. I've not observed the same issue with other datasets.

image