Open fgrunewald opened 2 years ago
This simple implementation doesn't work, because add_edge adds the node neighbour without attributes.
My guess is that this is faster since you don't iterate over all the edges in graph, but only over edges connected to nodes
. You could try removing the subgraph.add_node
, and add another loop over nodes at the end to set all the node attributes.
@pckroon not sure I understand. If node becomes neighbour doesn't this subgraph.add_node(node, **graph.nodes[node])
add the attributes needed? For polyply it seems to work and most martinis processors as well except the ISMAGS
I'm not completely sure that will do what you want. Would require some experimentation. Oh, and you lose all edge attributes, that could also be it.
If I find the time I'll play around with it even though I still don't understand why the molecule subgraph is so slow. A regular networks graph even with a ton of attributes is much faster than the equivalent molecule/block. Any idea if we killed some performance enhancing functions?
A regular networks graph even with a ton of attributes is much faster
For the same number of nodes/edges?
Molecule.subgraph
also deals with interactions, which probably also slows things down.
I think the main reason is that the default nx subgraph actually creates a view with shared node/edge attributes, while Molecule.subgraph
creates an actual Graph with its own, copied, node attributes.
See https://github.com/networkx/networkx/blob/main/networkx/classes/graph.py#L1694
@pckroon I think you're right here. Our subgraph copies interactions by looping over all of them plus it does an actual copy of the nodes whereas the nx function returns a view. Now the question is do we need this thorough subgraph really? I will investigate a little bit by stripping out some parts and see where it breaks. But my feeling is that interaction copying could be optional with a keyword arg and/or becomes more efficient with the hashable interactions data structure.
Now the question is do we need this thorough subgraph really?
Good question. My guess is we did it this way so we wouldn't have to hunt it down as an obscure bug sometime in the future.
My concern with adding a keyword for the interactions is that Molecule.subgraph
will have a different signature than Graph.subgraph
. I'll probably survive though ;)
Note that we generally call graph.subgraph (or molecule, whatever), but the residue graph function calls nx.subgraph https://github.com/marrink-lab/vermouth-martinize/blob/master/vermouth/graph_utils.py#L208, which actually calls graph.subgraph (https://github.com/networkx/networkx/blob/bc29f29698b4683286954dddaea64cd108d60f93/networkx/classes/function.py#L347). So potato, potato.
it appears nx.subgraph is rather slow, which directly affects a bunch of processes in martinize2. Mostly it is used when making the residue graph. Depending on the complexity of the residues a dumb implementation like shown below is about 5-10 times faster I assume because it scales much better than whatever nx does.
Why do you then not simply implement that you may ask: Something something ISMAGS now dies.