Open mrshirts opened 7 years ago
Yeah this is just being called way too many times to be implemented as a double for-loop. I think we can pretty easily reorganize the data so that this becomes a set lookup or the atoms are keys in a dict.
What would the pseudocode look for this for two bonds? We explicitly add to each force the list of bonds or pairs? I can take a pass if you have explicit ideas.
I think the easiest way will be to build up an intermediate bond dictionary in the desmond parser that is keyed by a tuple of the atoms (probably by indices).
So specifically, here load them into a dict keyed by those indices and then a few lines down when you're finding matches you just do old_bond = intermediate_bond_dict[(new_bond.atom1.index, new_bond.atom2.index)]
After all are parsed and filtered, add the dict values to the bond_force
set
Lists aren't hashable, though? Or am I misunderstanding something?
I guess could be done as tuple. . .
Yes, the keys would be a tuple of indices
Seems like match_pairs, match_angles, and match_dihedrals are really slowing things down.
Looking at a large file, running cProfile, we get at the top.
So _match_two_atoms, _match_three_atoms, _match_four_atoms take 1459/1583 = 92% of the time. _match_two_atoms, most of the time is in match_pairs instead of match_bonds, match_angles is _match_three_atoms, match_dihedrals is _match_four_atoms.
Not sure what the solution is, but just reporting.