pysal / libpysal

Core components of Python Spatial Analysis Library
http://pysal.org/libpysal
Other
258 stars 78 forks source link

Interest in a mutual knn? #664

Open ljwolf opened 9 months ago

ljwolf commented 9 months ago

For a different causal inference project, I'm writing a few spatial/feature matching algorithms.

I think we may want to offer a Mutual_KNN() constructor in Graph, and also bring a Symmetric_KNN()? or have symmetric/mutual options in a knn constructor?

This is like the current .symmetrize() function, but instead of adding edges to the KNN graph to induce symmetry, it removes edges to the KNN graph who are not mutually k-near.

This could also be implemented as a separate function for arbitrary graphs after weighting/kerneling, since it's just based on the edge set:

def mutual_knn(x, k=5):
    graph = KNN.from_array(x, k=k)
    directed_edges_array = graph.sparse != graph.sparse.T
    removed = (graph.sparse - directed_edges_array) > 0
    removed.eliminate_zeros()
    return WSP2W(WSP(removed))
martinfleis commented 9 months ago

What about a keyword in symmetrize controlling if the symmetrization happens using addition or removal of asymmetric edges? It can be then exposed in the KNN builder via a keyword (feels better than another builder).

ljwolf commented 9 months ago

That's a good idea imho!

knaaptime commented 9 months ago

😆 i have an example in my new graph materials

Screenshot 2023-12-08 at 8 34 22 AM

(note we still dont have symmetrize in the Graph yet)

ljwolf commented 9 months ago

We probably want to think about the API, then... I would prefer something like the following, implemented in the utils.py module, amenable for any weight type:

def make_symmetric(w, drop=False):
    if drop:
        # keep only mutual neighbors
        ...
    else:
        # add all reversed links to edge set
        ...

Also, people often do (W+W.T)/2 to symmetrize weight values (in either inclusive or exclusive cases). This is not possible to induce via a standardisation trick, and idk how we might want to implement that...?

martinfleis commented 9 months ago

Are we talking about implementing it in weights or graph? If the latter, then I'd like to have it as a method.

ljwolf commented 9 months ago

definitely graph. I'd like to avoid implementing things in weights.

martinfleis commented 9 months ago

@knaaptime a complete out of topic but since I've noticed it in your code above. Getting focals out of the Graph via neighbors is going to be super inefficient. Especially compared to .unique_ids which gives you nearly the same thing. g_knn10.unique_ids.to_series() gives you the same output but on a graph I have loaded in memory right now with 13890804 edges, the neighbors path takes 20.1s while unique_ids 186ms.

knaaptime commented 9 months ago

lol i noticed that right after i posted that photo and updated it to unique ids. thanks for the close eye!