phi-grib / Data_curation

Python code for handling data curation: SMILES of problematic molecules, dataset selection, train and test split.
GNU General Public License v3.0
7 stars 1 forks source link

Sanitization error of molecules #3

Closed EMVGaron closed 3 years ago

EMVGaron commented 4 years ago

Some molecules fail at Sanitization step. There should be a function that handles this errors and allows to classify the molecule despite this error.

Example:

from rdkit import Chem
error_struc = [Na+].[Na+].[Na+].[Na+].[Cu]1Oc2cc(ccc2N\\N=C\\3C(=O1)c4c(N)cc(cc4C=C3[S]([O-])(=O)=O)[S]([O-])(=O)=O)c5ccc6N\\N=C\\7C(=O[Cu]Oc6c5)c8c(N)cc(cc8C=C7[S]([O-])(=O)=O)[S]([O-])(=O)=O
Chem.MolFromSmiles(error_struc)
RDKit ERROR: [13:08:10] Explicit valence for atom # 16 O, 3, is greater than permitted

This either requires a specific handler for this error or a workaround to correct the molecule at make it valid for sanitization.

EMVGaron commented 3 years ago

Sanitization errors are handled by adding option sanitize=False in RDKit functions. Trying to automatise the valence error is not possible since each molecule has its own issues. This will be solved by giving the user the error structures and allowing him to correct them manually.