pysal / spaghetti

SPAtial GrapHs: nETworks, Topology, & Inference
http://pysal.org/spaghetti/
BSD 3-Clause "New" or "Revised" License
260 stars 69 forks source link

Enhancement: Optional parameter to only compute network cost matrix values if they are within X euclidean units #757

Open iboates opened 1 year ago

iboates commented 1 year ago

I am dealing with a case in which I have many thousands of points in a huge area. As a result, the cost matrix computation takes an exceedingly long time. I think that this could be mitigated if there were an optional parameter which, when specified, will make allneighbordistances quickly compute the euclidean distance between each client/facility pair, and if it is greater than this parameter, will simply set the cost matrix value to NaN. I think that this could really speed up computation in this case, at the expense of losing some connectivity information in the cost matrix - but in some cases it may not be relevant.

I think that this could really help in cases where you are doing facility location optimization with a very large number of candidate sites in a very dense environment (like a city), where the network travel distance is actually based on walking speed. In a case like this, it is quite likely to be irrelevant to compute the walking distance from one end of the city to another.

I would be interested in trying to implement something like this, but I'm new to the library and would like to get opinions & caveats from more experienced people on it.

gegen07 commented 1 year ago

Hey @iboates, great to see you here!

The cost matrix is not computed by spopt, we use allneighbordistances only in examples to illustrate the model usage. In geodataframe function, we use cdist function by scipy to calculate the distance between a pair of demand and candidate site based on a metric chosen by the user. So, it is hard to take this kind of issue because we probably have to implement all these metrics and is not the purpose of this package.

I think this issue can be discussed on spaghetti because it is specific to networks.

Another way to handle this issue can be to partition your matrix $N X M$ to ($N X C$ and $N X D$ such that $D+C = M$), then you concatenate the matrix and the result is your complete cost matrix. Note that you can have many parts of matrix to soften the complexity of these computations.