PDB file saving needs speeding up

samirelanduk / atomium

Python macromolecular parsing (with .pdb/.cif/.mmtf parsing and production)

https://atomium.bio

MIT License

102 stars 19 forks source link

PDB file saving needs speeding up #3

Closed samirelanduk closed 6 years ago

gf712 commented 6 years ago

hey, I was having a look at why it takes so long to save a PDB file with line_profiler, and it seems like the issue is in the function atom_to_atom_dict in structure2pdbdatafile.py. The issue seems to be on line 36 chain_id = atom.chain().chain_id() if atom.chain() else None. After looking more into it it seems like most of the time the code is calling the __len__ method of ResidueSequence. I am not sure as to what is really happening, but I hope this helps narrowing it down!

samirelanduk commented 6 years ago

Hi Gil, cheers for taking a look. I use snakeviz/cProfile on the parsing, but haven't yet put together anything similar for the saving, so this is good to know.

I can't see how line 36 would be be taking any time as it's all just simple lookup there, but I could see why the __len__ method would take a long time, as it has to call the resiudes() method which probably takes a while. I will try and sort this out after 0.6 is released this week.

Cheers!

samirelanduk commented 6 years ago

So this was driving me crazy because it turns out it was line 36, but it wasn't calling __len__ at all.

But, it turns out then when you convert an object to a bool, the __len__ will be called if it has one. So I replaced if chain with if chain is not None and now saving 1LOL takes 0.151 seconds instead of 6.33 seconds.