speed-up filterNodesFromFile

mauranolab / mapping

2 stars 1 forks source link

speed-up filterNodesFromFile #234

Closed andremrsantos closed 3 years ago

andremrsantos commented 3 years ago

This is a quick speed-up when filtering nodes based on whitelist or blacklist

mattmaurano commented 3 years ago

How does this work? Doesn't Graph.remove_edge() require an edge, not a node?

andremrsantos commented 3 years ago

How does this work? Doesn't Graph.remove_edge() require an edge, not a node?

I replaced the iterative check of all edges with G.edges(nodesToRemove) that return a "list" of edges incident to the nodes within nodesToRemove (see documentation).

I've made that substitution because the current algorithm (see original below) has a complexity of O(EN) since it checks all edges (E) if either nodes are contained in nodesToRemove that is a list and takes O(N) to check inclusion. Alternatively, we can convert nodesToRemove to a set which would make inclusion O(1), but I think the library function is probably faster.

[edge for edge in G.edges if edge[0] in nodesToRemove or edge[1] in nodesToRemove]

mattmaurano commented 3 years ago

right, that makes sense. But when I look at the docs https://networkx.org/documentation/stable//reference/classes/generated/networkx.Graph.remove_edge.html , remove_edge should be passed edges, not nodes, right?

andremrsantos commented 3 years ago

Yes, for either G.remove_edge or G.remove_edges_from you need to pass edges, but I am not using either of them here. I am using the library remove_edges instead.

mattmaurano commented 3 years ago

Oh, I see -- remove_edges() is not a networkx function, it is defined below. But it still assumes it is getting an edge. Does [e for e in G.edges if e in edgesToRemove] turn the list of nodes into a list of edges?

andremrsantos commented 3 years ago

No, I am passing a list of edges to remove_edges() that I generated using the function G.edges(nodesToRemove).

mattmaurano commented 3 years ago

Oh, I see now -- the way github colored the diff got me really confused.