mordred-descriptor / mordred

a molecular descriptor calculator
http://mordred-descriptor.github.io/documentation/master/
BSD 3-Clause "New" or "Revised" License
364 stars 96 forks source link

RuntimeError when using calc.pandas #106

Open alsalehf opened 1 year ago

alsalehf commented 1 year ago

description

I get stuck in a loop when using pandas.clac and results in runtime error RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase

Below is the code im using for testing: minimal reproduction code from rdkit import Chem from mordred import Calculator, descriptors

import pandas as pd

import unicodedata

components = ["CCO"]

def s2d(smiles_list): final_list = [unicodedata.normalize("NFKD", ls) for ls in smiles_list]

mols = [Chem.MolFromSmiles(smi) for smi in final_list]
calc = Calculator(descriptors, ignore_3D=True)
df3 = calc.pandas(mols)
return df3

s2d(components)

df = s2d(components) print(df)

Please fill me if possible.

environment

I'm running the code in a windows 10 machine with a venv environment.

Please fill me.

conda or pip

pip.

python version

Python 3.10.4

library version

Please execute the command and paste result.

Package Version


mordred 1.2.0 networkx 2.8.8 numpy 1.24.2 pandas 1.5.3 Pillow 9.4.0 pip 22.0.4 python-dateutil 2.8.2 pytz 2022.7.1 rdkit 2022.9.5 setuptools 58.1.0 six 1.16.0

pip show rdkit Name: rdkit Version: 2022.9.5 Summary: A collection of chemoinformatics and machine-learning software written in C++ and Python Home-page: https://github.com/kuelumbus/rdkit-pypi Author: Christopher Kuenneth Author-email: chris@kuenneth.dev License: BSD-3-Clause Location: c:\users\admin\chemslenv\lib\site-packages Requires: numpy, Pillow Required-by:

ismorphism commented 1 year ago

@alsalehf Hello! That's very important question, but somehow the developers do not have a time to answer it. The solution is to drop all "bad" descriptors - they do not work for you due to problems with your molecules' stereochemistry or smth else. That's my idea, which it's based on my experience:

from mordred import Calculator, PBF, MomentOfInertia, TopologicalCharge, MolecularDistanceEdge, MoRSE, GravitationalIndex, GeometricalIndex, EState, DistanceMatrix, DetourMatrix, CPSA, BaryszMatrix, Autocorrelation, AdjacencyMatrix, descriptors, get_descriptors_from_module

descs = get_descriptors_from_module(descriptors, submodule=True)

# exclude some from descs
descs = filter(lambda d: ((d.__module__ != AdjacencyMatrix.__name__) and 
                          (d.__module__ != Autocorrelation.__name__) and
                          (d.__module__ != DetourMatrix.__name__) and 
                          (d.__module__ != BaryszMatrix.__name__) and 
                          (d.__module__ != CPSA.__name__) and 
                          (d.__module__ != DistanceMatrix.__name__) and 
                          (d.__module__ != EState.__name__) and 
                          (d.__module__ != GeometricalIndex.__name__) and 
                          (d.__module__ != GravitationalIndex.__name__) and 
                          (d.__module__ != MoRSE.__name__) and 
                          (d.__module__ != MolecularDistanceEdge.__name__) and 
                          (d.__module__ != MomentOfInertia.__name__) and 
                          (d.__module__ != PBF.__name__) and 
                          (d.__module__ != TopologicalCharge.__name__)), descs)

calc = Calculator(descs)
calc.pandas(mols)