keiserlab / e3fp

3D molecular fingerprints
GNU Lesser General Public License v3.0
121 stars 33 forks source link

Generating conformers from rdkit mol object fails due to name property #35

Closed aparente-nurix closed 4 years ago

aparente-nurix commented 5 years ago

I'm trying to generate 3D conformers from an existing rdkit mol object (which is stored in a dataframe). The problem seems to be related to the Name property, which is not set for this molecule. I know the molecule itself isn't the problem, since confs_from_smiles works just fine and this is the same smiles string I used to generate the rdkit molecule.

This is on python 3.7 and the latest e3fp build on the master branch (cloned from repo).

mols=generate_conformers(allFeats['feature_mol'][:1][0])

`--------------------------------------------------------------------------- KeyError Traceback (most recent call last)

in 1 get_ipython().run_line_magic('timeit', '') ----> 2 mols=generate_conformers(allFeats['feature_mol'][:1][0]) 3 4 5 #confs=confs_from_smiles(allFeats['Feature_Structures'][:1][0], 'conf', confgen_params=confgen_params) ~/Programs/e3fp/e3fp/conformer/generate.py in generate_conformers(input_mol, name, standardise, num_conf, first, pool_multiplier, rmsd_cutoff, max_energy_diff, forcefield, seed, out_file, out_dir, save, compress, overwrite) 102 """ 103 if name is None: --> 104 name = input_mol.GetProp("_Name") 105 106 if standardise: KeyError: '_Name' ` When I try to explicitly set the name, I get a similar error: `mols=generate_conformers(allFeats['feature_mol'][:1][0], name='conformer')` `2019-08-14 18:29:39,627|INFO|Generating conformers for conformer. 2019-08-14 18:29:39,629|WARNING|Problem generating conformers for conformer. Traceback (most recent call last): File "/home/aparente/Programs/e3fp/e3fp/conformer/generate.py", line 129, in generate_conformers mol, values = conf_gen.generate_conformers(input_mol) File "/home/aparente/Programs/e3fp/e3fp/conformer/generator.py", line 143, in generate_conformers mol = self.embed_molecule(mol) File "/home/aparente/Programs/e3fp/e3fp/conformer/generator.py", line 201, in embed_molecule logging.debug("Adding hydrogens for %s" % mol.GetProp('_Name')) KeyError: '_Name' `
sethaxen commented 5 years ago

E3FP requires that all molecules have names. This is both for logging but also so that we can know which fingerprints correspond to which molecule in the resulting E3FP database. Our convention of using the property "_Name" for name follows RDKit's own convention (unless they've changed this, which doesn't seem to be the case).

In case this is a bug, it would be helpful if you could provide a minimal failing example so I can reproduce it. Either a mol/sdf file with a molecule or code to generate the molecule from smiles and the code that fails.

Without a minimal failing example, I'd say setting the name directly would be the way to go. For a molecule mol with name "MyMol", do mol.SetProp("_Name", "MyMol"). Then E3FP should know how to work with it.