protocaller / ProtoCaller

Full automation of relative protein-ligand binding free energy calculations in GROMACS
http://protocaller.readthedocs.io
GNU General Public License v3.0
43 stars 15 forks source link

Memory usage constantly growing with the number of perturbations. #31

Closed kexul closed 3 years ago

kexul commented 3 years ago

Here is the code I used:

gands = Chem.SDMolSupplier('CDK2_ligands.sdf')
pair = pd.read_csv('CDK2_pertu.csv', comment='#', names=['p1', 'p2', 'ddg', 'error'])

with Dir('CDK2', overwrite=False):
    mol_dict = {}
    for item in gands:
        lig_name = item.GetProp('_Name')
        mol_dict[lig_name] = Ligand(item, protonated=True, minimise=False, name=lig_name, workdir='Ligands', parametrised_files=[f'{lig_name}.prmtop', f'{lig_name}.inpcrd'])
    pairs = []
    for _, row in pair.iterrows():
        pairs.append([mol_dict[row.p1], mol_dict[row.p2]])

    protein = Protein('4EOR', ligand_ref='301')
    system = Ensemble(protein=protein, morphs=pairs, box_length_complex=8, workdir='Protein', ligand_ff='gaff2')
    system.protein.filter(ligands=None, waters='all')
    system.protein.prepare()
    system.protein.parametrise()
    system.prepareComplexes()

The memory of this program used constantly grew with the number of perturbations generated. I had >40 perturbations in the csv, the program was killed when about 20 perturbations was generated in my 8 GB memory machine.

Here is the input files I used: cdk2.zip

msuruzhon commented 3 years ago

Hi, could you reproduce this on some simpler ligands because waiting for all the ligands to parametrise will take quite a long time? Alternatively, you could perhaps upload all of the parameter files instead. I tried reproducing this on a T4-lysozyme system bound to chains of hydrocarbons and I don't seem to have any memory issues. My garbage collection seems to be releasing memory just fine after each iteration.

kexul commented 3 years ago

Ligands.zip Hi @msuruzhon , these are the parameter files I used. Hope it helps!

msuruzhon commented 3 years ago

Thanks @kexul, I profiled these overnight and they ran just fine on my 8 GB machine. If you look at the attached graph you can see that the memory only goes up to about 5 GB at most, which is to be expected for this large system. The steady increase over time is most likely due to storing the morph objects – this is relatively cheap and needed because alignment and morphing can be slow in some cases. Do you think that your issues might be related to some other programs running? You still need most of your memory to be free in order to run these, or alternatively you could increase your swap file size. Memory

kexul commented 3 years ago

Hi @msuruzhon , thanks for your work! Your suggestion remind me that I did not have swap file in my system. I'll try increase it.

By the way, I used these code to reproduce the benchmark and got a pearson R of 0.65 in CDK2. Thanks for your great work!

msuruzhon commented 3 years ago

Thank you for your kind words @kexul! I am always happy to hear ProtoCaller produces structures that perform well on benchmark calculations, especially since it seems that this performance is comparable to commercial packages :)

kexul commented 3 years ago

Yeah, that's amazing @msuruzhon !!