Open knaaptime opened 3 months ago
My original implementation was pure pandas, based on join and explode with no loops. Would it make sense to revisit that?
yeah i was wondering about explode there. I presume that would be even faster?
I think so.
We moved away from it because (a) hashing the points to establish unique locations was expensive and (b) our policy on sorting was different, but I think we can still use pandas join and explode given the new function signature.
The earlier implementation, conceptually:
on relatively large graphs (n > ~10k) with colocated points, the clique expansion function can get really slow. After cutting it short a few times, I think this for-loop is where things are getting stuck. If that's the bottleneck, I think it should be pretty straightforward to parallelize using something like joblib