pysal / libpysal

Core components of Python Spatial Analysis Library
http://pysal.org/libpysal
Other
265 stars 78 forks source link

parallelize clique expansion #764

Open knaaptime opened 3 months ago

knaaptime commented 3 months ago

on relatively large graphs (n > ~10k) with colocated points, the clique expansion function can get really slow. After cutting it short a few times, I think this for-loop is where things are getting stuck. If that's the bottleneck, I think it should be pretty straightforward to parallelize using something like joblib

ljwolf commented 3 months ago

My original implementation was pure pandas, based on join and explode with no loops. Would it make sense to revisit that?

knaaptime commented 3 months ago

yeah i was wondering about explode there. I presume that would be even faster?

ljwolf commented 3 months ago

I think so.

We moved away from it because (a) hashing the points to establish unique locations was expensive and (b) our policy on sorting was different, but I think we can still use pandas join and explode given the new function signature.

The earlier implementation, conceptually: