Errors in running tests

amin-sagar commented 1 year ago

Hello deepQM developers. Thanks for this awesome work. I tried to run the tests in PL directory. However, I ran into some errors. The output looks like this.

Nuber of CUDA devices:  0
  0%|                                                                                                        | 0/11 [00:00<?, ?it/s]Nuber of CUDA devices:  0
trjmol0.pdb. pdb file is processing ...
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Warning: an error was encountered.
CUDA out of memory in case of aimnet model or your system may contain elements such as "F, Cl and S".
Returned 0.0 for energy
  0%|                                                                                                        | 0/11 [00:20<?, ?it/s]
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/amin/anaconda3/envs/deepQM/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/amin/softwares/deepQM/deepQM.py", line 239, in _SPgroupedMultiMol
    result = calcSPWithModel(model, mol)
  File "/home/amin/softwares/deepQM/deepQM.py", line 79, in calcSPWithModel
    return sp_e
UnboundLocalError: local variable 'sp_e' referenced before assignment
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/amin/softwares/deepQM/deepQM.py", line 502, in <module>
    runSPgroupedMultiMol(n_procs)
  File "/home/amin/softwares/deepQM/deepQM.py", line 286, in runSPgroupedMultiMol
    for result in tqdm.tqdm(pool.imap_unordered(func=_SPgroupedMultiMol, iterable=idxs), total=len(idxs)):
  File "/home/amin/anaconda3/envs/deepQM/lib/python3.8/site-packages/tqdm/std.py", line 1178, in __iter__
    for obj in iterable:
  File "/home/amin/anaconda3/envs/deepQM/lib/python3.8/multiprocessing/pool.py", line 868, in next
    raise value
UnboundLocalError: local variable 'sp_e' referenced before assignment
/home/amin/softwares/deepQM/scripts/bindEnAniD3.py:31: RuntimeWarning: Mean of empty slice.
  return np_array.mean()
/home/amin/anaconda3/envs/deepQM/lib/python3.8/site-packages/numpy/core/_methods.py:194: RuntimeWarning: invalid value encountered in scalar divide
  ret = ret / rcount
/home/amin/anaconda3/envs/deepQM/lib/python3.8/site-packages/numpy/core/_methods.py:269: RuntimeWarning: Degrees of freedom <= 0 for slice
  ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
Traceback (most recent call last):
  File "/home/amin/softwares/deepQM/scripts/bindEnAniD3.py", line 58, in <module>
    diff_aniStd = std(diff_ani)
  File "/home/amin/softwares/deepQM/scripts/bindEnAniD3.py", line 36, in std
    return np_array.std()
  File "/home/amin/anaconda3/envs/deepQM/lib/python3.8/site-packages/numpy/core/_methods.py", line 269, in _std
    ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
  File "/home/amin/anaconda3/envs/deepQM/lib/python3.8/site-packages/numpy/core/_methods.py", line 226, in _var
    arrmean = um.true_divide(arrmean, div, out=arrmean,
ZeroDivisionError: division by zero

I looked into deepQM.py and it seems to me that this could be because of the following lines. sp_e would not exist if the first try segment fails.

    try:
        sp_e =  np.round(mol.get_potential_energy(), 16)
    except:
       print(warning)
       return 0.0
    finally:
        os.chdir(pwd)
        shutil.rmtree(workdir) #intesting run after retunr statement
        return sp_e

I changed it to

try:
        sp_e =  np.round(mol.get_potential_energy(), 16)
    except:
       print(warning)
       sp_e = 0.0
       return 0.0
    finally:
        os.chdir(pwd)
        shutil.rmtree(workdir) #intesting run after retunr statement
        return sp_e

Now, I get this

Nuber of CUDA devices:  0
  0%|                                                                                                        | 0/11 [00:00<?, ?it/s]Nuber of CUDA devices:  0
trjmol0.pdb. pdb file is processing ...
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Warning: an error was encountered.
CUDA out of memory in case of aimnet model or your system may contain elements such as "F, Cl and S".
Returned 0.0 for energy
  9%|████████▋                                                                                       | 1/11 [00:20<03:23, 20.38s/it]trjmol1.pdb. pdb file is processing ...
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Warning: an error was encountered.
CUDA out of memory in case of aimnet model or your system may contain elements such as "F, Cl and S".
Returned 0.0 for energy
 18%|█████████████████▍                                                                              | 2/11 [00:39<02:57, 19.74s/it]

This is printed for all the analyzed pdbs. The calculation completes and I get a Summary.dat file

=============================================================
                      SUMMARY-kcal/mol
=============================================================
dftd3 (wb97x)            =     0.000000    +/-       0.000000
ani2x (wb97x/6-31G*)    =   -21.854917    +/-       2.772305
-------------------------------------------------------------
Binding energy           =    -8.279210    +/-       0.343766
-------------------------------------------------------------
=============================================================

These segmentation faults come from the dftd3 part as removing it from the model list gets rid of them but them I can't run bindEnAniD3.py as the d3 part is missing. Can you please help me with this. Best, Amin

amin-sagar commented 1 year ago

I think I have found the solution. This seems to be a dftd3 error. Setting ulimit -s unlimited in the script solves the problem.

otayfuroglu commented 1 year ago

Hi Amin, Sorry for the late reply. You already found the solution. Thanks for your interest and support. Bests, Omer

otayfuroglu / deepQM

Errors in running tests #1