keiserlab / e3fp

3D molecular fingerprints
GNU Lesser General Public License v3.0
122 stars 33 forks source link

`e3fp` for diatomic molecules #58

Open FanwangM opened 3 years ago

FanwangM commented 3 years ago

Thanks for making this nice tool for the community.

I got problems with computing the e3fp fingerprints for diatomic molecules, such as H2, O2 and CO. Here is the corresponding error information

from e3fp.pipeline import confs_from_smiles, fprints_from_mol

# configurations
confgen_params = {"max_energy_diff": 20.0, "first": 3}
fprint_params = {"bits": 4096, "radius_multiplier": 1.5, "rdkit_invariants": True}

# build molecular conformer
mol = confs_from_smiles("[HH]", "h2_gas", confgen_params=confgen_params)
# compute the fingerprint
fprints = fprints_from_mol(mol, fprint_params=fprint_params)
RDKit WARNING: [19:42:12] WARNING: not removing hydrogen atom without neighbors
2021-08-16 19:42:12,640|INFO|Generating conformers for h2_gas.
2021-08-16 19:42:12,662|INFO|Generated 1 conformers for h2_gas.
2021-08-16 19:42:12,664|INFO|Generating fingerprints for h2_gas.
2021-08-16 19:42:12,666|ERROR|Error generating fingerprints for h2_gas.
Traceback (most recent call last):
  File "/home/legend/softs/miniconda3/envs/chem_py37/lib/python3.7/site-packages/e3fp/fingerprint/generate.py", line 188, in fprints_dict_from_mol
    fingerprinter.run(conf, mol)
  File "/home/legend/softs/miniconda3/envs/chem_py37/lib/python3.7/site-packages/e3fp/fingerprint/fprinter.py", line 181, in run
    self.initialize_conformer(conf)
  File "/home/legend/softs/miniconda3/envs/chem_py37/lib/python3.7/site-packages/e3fp/fingerprint/fprinter.py", line 262, in initialize_conformer
    bound_atoms_dict=self.bound_atoms_dict,
  File "/home/legend/softs/miniconda3/envs/chem_py37/lib/python3.7/site-packages/e3fp/fingerprint/fprinter.py", line 547, in __init__
    self.distance_matrix = array_ops.make_distance_matrix(atom_coords)
  File "/home/legend/softs/miniconda3/envs/chem_py37/lib/python3.7/site-packages/e3fp/fingerprint/array_ops.py", line 57, in make_distance_matrix
    return squareform(pdist(coords))
  File "/home/legend/softs/miniconda3/envs/chem_py37/lib/python3.7/site-packages/scipy/spatial/distance.py", line 2018, in pdist
    raise ValueError('A 2-dimensional array must be passed.')
ValueError: A 2-dimensional array must be passed.
-------------------------------------------------------------------
ValueError                        Traceback (most recent call last)
<ipython-input-14-99e54f4a484f> in <module>
      8 mol = confs_from_smiles("[HH]", "h2_gas", confgen_params=confgen_params)
      9 # compute the fingerprint
---> 10 fprints = fprints_from_mol(mol, fprint_params=fprint_params)

~/softs/miniconda3/envs/chem_py37/lib/python3.7/site-packages/e3fp/pipeline.py in fprints_from_mol(mol, fprint_params, save)
     57     fprints_dict = fprints_dict_from_mol(mol, save=save, **fprint_params)
     58     level = fprint_params.get("level", -1)
---> 59     fprints_list = fprints_from_fprints_dict(fprints_dict, level=level)
     60     return fprints_list
     61 

~/softs/miniconda3/envs/chem_py37/lib/python3.7/site-packages/e3fp/pipeline.py in fprints_from_fprints_dict(fprints_dict, level)
     48     """Get fingerprint at `level` from dict of level to fingerprint."""
     49     fprints_list = fprints_dict.get(
---> 50         level, fprints_dict[max(fprints_dict.keys())]
     51     )
     52     return fprints_list

ValueError: max() arg is an empty sequence

Do we have a fix for this? Thank you!

sethaxen commented 2 years ago

Hi, I'm very sorry for the late reply to this issue. I was only partially able to reproduce the error:

>>> import e3fp.
>>> from e3fp.pipeline import fprints_from_mol, confs_from_smiles
>>> smiles_dict = {"h2": "[HH]", "o2": "O=O", "co": "[C-]#[O+]"}
>>> confgen_params = {'max_energy_diff': 20.0, 'first': 3}
>>> fprint_params = {"bits": 4096, "radius_multiplier": 1.5, "rdkit_invariants": True}
>>> mol = confs_from_smiles(smiles_dict["o2"], "o2", confgen_params=confgen_params)
2022-06-02 02:24:11,639|INFO|Generating conformers for o2.
2022-06-02 02:24:11,648|INFO|Generated 1 conformers for o2.
>>> fprints = fprints_from_mol(mol, fprint_params=fprint_params)
2022-06-02 02:24:19,635|INFO|Generating fingerprints for o2.
2022-06-02 02:24:19,640|INFO|Generated 1 fingerprints for o2.
>>> mol = confs_from_smiles(smiles_dict["co"], "co", confgen_params=confgen_params)
2022-06-02 02:24:29,416|INFO|Generating conformers for co.
2022-06-02 02:24:29,422|INFO|Generated 1 conformers for co.
>>> fprints = fprints_from_mol(mol, fprint_params=fprint_params)
2022-06-02 02:24:31,869|INFO|Generating fingerprints for co.
2022-06-02 02:24:31,873|INFO|Generated 1 fingerprints for co.
>>> mol = confs_from_smiles(smiles_dict["h2"], "h2", confgen_params=confgen_params)
[02:24:41] WARNING: not removing hydrogen atom without neighbors
2022-06-02 02:24:41,237|INFO|Generating conformers for h2.
2022-06-02 02:24:41,244|INFO|Generated 1 conformers for h2.
>>> fprints = fprints_from_mol(mol, fprint_params=fprint_params)
2022-06-02 02:24:42,228|INFO|Generating fingerprints for h2.
2022-06-02 02:24:42,229|ERROR|Error generating fingerprints for h2.
Traceback (most recent call last):
...

i.e. I had no issues fingerprinting O2 and CO, just H2.

In general, diatomic molecules should be supported by e3fp. If I had to guess, H2 fails because we never use atomic coordinates of hydrogens for fingerprinting. But for a molecule that is pure hydrogen (i.e. just this molecule and protons), this would of course cause fingerprinting to fail. Here we could either

  1. Explicitly add and use hydrogens, or
  2. The fingerprint should have no "on" bits. While the latter seems preferable for consistency, if it produces non-unit fingerprint metrics between fingerprints for 2 hydrogen molecules (need to check), I think this would not be ideal. @mjke what do you think?