keiserlab / e3fp

3D molecular fingerprints
GNU Lesser General Public License v3.0
122 stars 33 forks source link

Fingerprint generation runs and fails on [Cl-].[K+] (and possibly similar molecules) #18

Open 8li opened 7 years ago

8li commented 7 years ago

Fingerprint generation (using generate.py in serial mode) for [Cl-].[K+] raises the following catastrophic error:

2017-05-16 14:51:41,242|INFO|Generating fingerprints for CHEMBL1200731.
2017-05-16 14:51:41,242|ERROR|Error generating fingerprints for CHEMBL1200731.
Traceback (most recent call last):
  File "/netapp/home/ali/projects/e3fp/e3fpgen/generate.py", line 147, in fprints_dict_from_mol
    fingerprinter.run(conf, mol)
  File "/netapp/home/ali/opt/miniconda2/envs/e3fp_pip/lib/python2.7/site-packages/e3fp/fingerprint/fprinter.py", line 156, in run
    self.initialize_conformer(conf)
  File "/netapp/home/ali/opt/miniconda2/envs/e3fp_pip/lib/python2.7/site-packages/e3fp/fingerprint/fprinter.py", line 229, in initialize_confor
mer
    bound_atoms_dict=self.bound_atoms_dict)
  File "/netapp/home/ali/opt/miniconda2/envs/e3fp_pip/lib/python2.7/site-packages/e3fp/fingerprint/fprinter.py", line 474, in __init__
    self.distance_matrix = array_ops.make_distance_matrix(atom_coords)
  File "/netapp/home/ali/opt/miniconda2/envs/e3fp_pip/lib/python2.7/site-packages/e3fp/fingerprint/array_ops.py", line 56, in make_distance_mat
rix
    return squareform(pdist(coords))
  File "/netapp/home/ali/opt/miniconda2/envs/e3fp_pip/lib/python2.7/site-packages/scipy/spatial/distance.py", line 1217, in pdist
    raise ValueError('A 2-dimensional array must be passed.')
ValueError: A 2-dimensional array must be passed.
2017-05-16 14:51:42,169|ERROR|Error running: ('/netapp/home/ali/projects/e3fp/confgen/hashed/e2/CHEMBL1200731.sdf.bz2',)
Traceback (most recent call last):
  File "build/bdist.macosx-10.7-x86_64/egg/python_utilities/parallel.py", line 328, in serial_run
    yield (result, data)
GeneratorExit

Should probably check and skip these type of molecules, and allow the generate.py script to continue regardless.

mjke commented 7 years ago

yes for our purposes we want to scrub all molecules first using the standardiser

On May 16, 2017 3:18 PM, "amanda li" notifications@github.com wrote:

Fingerprint generation (using generate.py in serial mode) for [Cl-].[K+] raises the following catastrophic error:

2017-05-16 14:51:41,242|INFO|Generating fingerprints for CHEMBL1200731. 2017-05-16 14:51:41,242|ERROR|Error generating fingerprints for CHEMBL1200731. Traceback (most recent call last): File "/netapp/home/ali/projects/e3fp/e3fpgen/generate.py", line 147, in fprints_dict_from_mol fingerprinter.run(conf, mol) File "/netapp/home/ali/opt/miniconda2/envs/e3fp_pip/lib/python2.7/site-packages/e3fp/fingerprint/fprinter.py", line 156, in run self.initialize_conformer(conf) File "/netapp/home/ali/opt/miniconda2/envs/e3fp_pip/lib/python2.7/site-packages/e3fp/fingerprint/fprinter.py", line 229, in initialize_confor mer bound_atoms_dict=self.bound_atoms_dict) File "/netapp/home/ali/opt/miniconda2/envs/e3fp_pip/lib/python2.7/site-packages/e3fp/fingerprint/fprinter.py", line 474, in init self.distance_matrix = array_ops.make_distance_matrix(atom_coords) File "/netapp/home/ali/opt/miniconda2/envs/e3fp_pip/lib/python2.7/site-packages/e3fp/fingerprint/array_ops.py", line 56, in make_distance_mat rix return squareform(pdist(coords)) File "/netapp/home/ali/opt/miniconda2/envs/e3fp_pip/lib/python2.7/site-packages/scipy/spatial/distance.py", line 1217, in pdist raise ValueError('A 2-dimensional array must be passed.') ValueError: A 2-dimensional array must be passed. 2017-05-16 14:51:42,169|ERROR|Error running: ('/netapp/home/ali/projects/e3fp/confgen/hashed/e2/CHEMBL1200731.sdf.bz2',) Traceback (most recent call last): File "build/bdist.macosx-10.7-x86_64/egg/python_utilities/parallel.py", line 328, in serial_run yield (result, data) GeneratorExit

Should probably check and skip these type of molecules, and allow the generate.py script to continue regardless.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/keiserlab/e3fp/issues/18, or mute the thread https://github.com/notifications/unsubscribe-auth/AHDL9rR1-OqND0yV-fgw4pF6zCdLB43iks5r6iCcgaJpZM4NdJiw .

sethaxen commented 7 years ago

@mjke yes, but the Fingerprinter should also anticipate cases like this, fail gracefully with an informative warning, and proceed.

We previously had issues when there were floating (unbound) molecules (see #9), and the Fingerprinter.exclude_floating parameter handles these cases, but if I were to speculate, that results in no atoms being fingerprinted for molecules like these. A simple check for an empty atoms list after this exclusion step should handle this.

sethaxen commented 7 years ago

@8li to standardize, you'll need to pip install standardiser and then pass the --standardise parameter during conformer generation.

sethaxen commented 7 years ago

Commit 7ffc068 should prevent catastrophic failure in cases like this and all cases where an error occurs in fingerprinting. Still, we should anticipate this case and log an informative error message.