marrink-lab / vermouth-martinize

Describe and apply transformation on molecular structures and topologies
Apache License 2.0
89 stars 40 forks source link

Slow Merge Molecule for large molecules #408

Closed fgrunewald closed 2 years ago

fgrunewald commented 2 years ago

For medium to large molecules (>1000 residues) merge molecule becomes prohibitively slow. Good for us the 93% of the time is spent figuring out the maximum node in this line:

https://github.com/marrink-lab/vermouth-martinize/blob/c437876bd494b7842737da48533c44d374fb0526/vermouth/molecule.py#L686

@pckroon can't we just initialize the last_node when the molecule first has a node added and then keep track of it? I'm happy to make the PR if we can decide on the best way to solve this.

pckroon commented 2 years ago

Sounds good to me. It would also be good to check if a node idx is already used (idx in self), which should be a cheap O(1) operation, just in case. Maybe even put that as an assert

fgrunewald commented 2 years ago

@pckroon what would you think should be the behaviour when a full residue is deleted? I think that is quite an edge case but theoretically speaking.

pckroon commented 2 years ago

Good question. I don't think it matters for node IDs, but resids will be a mess. Maybe just ignore the edge case for the time being.