shirtsgroup / OpenFFCrystalBenchmarking

1 stars 1 forks source link

Optimize code #6

Open mrshirts opened 2 years ago

mrshirts commented 2 years ago

Look at ways to optimize the derivative code (and other parts of the code) in python

Yu-Tang-Lin commented 2 years ago

Note for me: supercell_generation.py Line 41 mol_sc = next(pybel.readfile('pdb', path)) is really slow. (12 hours for extracting text from around 300 files) Need to find a better algorithm to accelerate it later

mrshirts commented 2 years ago

Line 41 mol_sc = next(pybel.readfile('pdb', path)) is really slow. (12 hours for extracting text from around 300 files) Need to find a better algorithm to accelerate it later.

Interesting. There should be some OpenFF tools for reading pdb files, I would think? Just to make sure it is this line, try just a single import - it should take 12 hours / 300 files = 2.4 min. We do want to try to use OpenFF tools (or RDKit?) to avoid extra external dependencies.

Yu-Tang-Lin commented 2 years ago

Yes, I am also confused by why it takes some long for reading the file.

Note for @Yu-Tang-Lin : Another possible reason may be my GPU driver crush again. Even though I do think Line:41 will use GPU, it still is better to eliminate problems one by one.

Yu-Tang-Lin commented 2 years ago

Note for @Yu-Tang-Lin : All of the code including generating PDB file and OPENMM energy minimization is are using CPU to calculate, is that possible we use GPU to calculate?

I guess OPENMM energy minimization is possible since the tutorial also uses GPU to calculate.

mrshirts commented 2 years ago

GPU optimization is the last thing to do. The reason for this is that it significantly complicates compilation and installation, and thus should be reserved as the last thing to do. Most important thing is to first push everything down to numpy and scipy operations, rather than be coded with explicit loops without using vector calls.