pierrepo / PBxplore

A suite of tools to explore protein structures with Protein Blocks :snake:
https://pbxplore.readthedocs.org/en/latest/
MIT License
28 stars 17 forks source link

[Proposal] Optimize PBassign when reading xtc trajectory #115

Closed HubLot closed 8 years ago

HubLot commented 8 years ago

Hi,

I continue my quest of optimize PBxplore. In this PR, I focused mainly on the reading of xtc trajectory through MDAnalysis and I was able to gain ~15% speed-up.

First, I introduced classmethod property for the class Atom. I noticed we usually create an empty Atom() and then populate it with values depending on the input (PDB, PDBx, XTC). Creating an empty object and then modify its attributes is not really efficient. @classmethod allow us to redefine a function to be a constructor. Create a object and set its attribute is now done once.

For example: atom = Atom(); atom.read_PDB(line) is now atom = Atom.read_PDB(line)

I also removed some Atom() attributes which were never used (occupancy, temperature, etc). This does not change the output of format() function. I also realized that:

self.x, self.y, self.z = atm.position

is faster than

self.x = atm.position[0]
self.y = atm.position[1]
self.z = atm.position[2]

Second, I focused on the chains_from_trajectory() function. I noticed, at each frame, we created a new chain object (called structure) filled with new atoms while in fact, only the coordinates of these atoms changed during a trajectory. It was a bit overkill :) Now, only one structure object is created at the beginning. During the loop over the trajectory, only the coordinates of the atoms are updated. To do that, I introduced some new functions inside Atom() (a setter to modify the coordinates) and Chain() ( a function which modify all coordinates of all atoms). With these optimizations, the chains_from_trajectory() function cannot be faster unless touching MDAnalysis. Creating the universe object and looping over the trajectory take now 95% of the time of this function whereas it was ~50% before.

Eventually, all of these optimizations don't change the public API.

Let me know what you think!

P.S: I fixed an issue with the doc build and remove some weird print() (my fault) in tests.

pierrepo commented 8 years ago

Not so simple the use of decorator.... But why not.

HubLot commented 8 years ago

Yep, I know. This can be removed if you want. Since, with the second part, the constructor of Atom() is called much less often than before. So the gain of performance is minimal for a xtc trajectory, I don't know about many multiple PDBs. In fact, I came up with the second optimization after the first one... :)

jbarnoud commented 8 years ago

This is awesome! The code is much clearer.

pierrepo commented 8 years ago

@HubLot Do not touch anything. This is really good! I will eventually learn decorator ;-)