Open alsalehf opened 1 year ago
@alsalehf Hello! That's very important question, but somehow the developers do not have a time to answer it. The solution is to drop all "bad" descriptors - they do not work for you due to problems with your molecules' stereochemistry or smth else. That's my idea, which it's based on my experience:
from mordred import Calculator, PBF, MomentOfInertia, TopologicalCharge, MolecularDistanceEdge, MoRSE, GravitationalIndex, GeometricalIndex, EState, DistanceMatrix, DetourMatrix, CPSA, BaryszMatrix, Autocorrelation, AdjacencyMatrix, descriptors, get_descriptors_from_module
descs = get_descriptors_from_module(descriptors, submodule=True)
# exclude some from descs
descs = filter(lambda d: ((d.__module__ != AdjacencyMatrix.__name__) and
(d.__module__ != Autocorrelation.__name__) and
(d.__module__ != DetourMatrix.__name__) and
(d.__module__ != BaryszMatrix.__name__) and
(d.__module__ != CPSA.__name__) and
(d.__module__ != DistanceMatrix.__name__) and
(d.__module__ != EState.__name__) and
(d.__module__ != GeometricalIndex.__name__) and
(d.__module__ != GravitationalIndex.__name__) and
(d.__module__ != MoRSE.__name__) and
(d.__module__ != MolecularDistanceEdge.__name__) and
(d.__module__ != MomentOfInertia.__name__) and
(d.__module__ != PBF.__name__) and
(d.__module__ != TopologicalCharge.__name__)), descs)
calc = Calculator(descs)
calc.pandas(mols)
description
I get stuck in a loop when using pandas.clac and results in runtime error RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase
Below is the code im using for testing: minimal reproduction code from rdkit import Chem from mordred import Calculator, descriptors
import pandas as pd
import unicodedata
components = ["CCO"]
def s2d(smiles_list): final_list = [unicodedata.normalize("NFKD", ls) for ls in smiles_list]
s2d(components)
df = s2d(components) print(df)
Please fill me if possible.
environment
I'm running the code in a windows 10 machine with a venv environment.
Please fill me.
conda or pip
pip.
python version
Python 3.10.4
library version
Please execute the command and paste result.
Package Version
mordred 1.2.0 networkx 2.8.8 numpy 1.24.2 pandas 1.5.3 Pillow 9.4.0 pip 22.0.4 python-dateutil 2.8.2 pytz 2022.7.1 rdkit 2022.9.5 setuptools 58.1.0 six 1.16.0
pip show rdkit Name: rdkit Version: 2022.9.5 Summary: A collection of chemoinformatics and machine-learning software written in C++ and Python Home-page: https://github.com/kuelumbus/rdkit-pypi Author: Christopher Kuenneth Author-email: chris@kuenneth.dev License: BSD-3-Clause Location: c:\users\admin\chemslenv\lib\site-packages Requires: numpy, Pillow Required-by: