Open christophhillisch opened 5 years ago
Thank you for your question.
There are two options calculating subset of descriptors.
from mordred import Calculator, ABCIndex, AcidBase, Weight, ZagrebIndex, WienerIndex
calc = Calculator([
ABCIndex, # register all presets of descriptors in module (register ABC and ABCGG)
AcidBase.AcidicGroupCount, # register all presets of the descriptor (register nAcid)
Weight.Weight(), # register the descriptor (MW)
ZagrebIndex.ZagrebIndex(1, 1), # Zagreb1
WienerIndex.WienerIndex(False), # WPath
], ignore_3D=True)
print(len(calc.descriptors)) # 6
Option 1 is verbose, thus you can also filtering descriptor.
from mordred import Calculator, descriptors
descriptor_list = {'ABC', 'nAcid', 'MW', 'Zagreb1', 'WPath'}
calc = Calculator(descriptors, ignore_3D=True) # register all descriptors
calc.descriptors = [d for d in calc.descriptors if str(d) in descriptor_list] # re-register subset of descriptors
print(len(calc.descriptors)) # 5
Please feel free to ask if there are any parts that are unclear.
Thanks,
As a subset of @ChrisHill8 question, is it in any way possible to filter the "sub-descriptors"? e.g: only calculating the EState_VSA5 descriptor, not the full list of EStates?
@DCoupry yes, you can do it. Moreover, non-default descriptors (e.g. very large ring count (it may be useful in macro cyclic data set) etc.) is also able to calculate.
example:
from mordred import descriptor
from mordred.MoeType import EState_VSA
from mordred.RingCount import RingCount
calc = Calculator(EState_VSA(5))
print(len(calc.descriptors)) # 1
calc(mol)
ring18 = RingCount(order=18) # 18-membered ring count descriptor
ring18(mol) # descriptor instance can also call like a function
List of default descriptors are available in http://mordred-descriptor.github.io/documentation/master/descriptors.html.
Thanks,
@philopon Is there a version of option 2 that allows specifying only the modules, not submodules?
Hi there, I notice that VSA_EState (all) are outputting out of its range I tried with 2017 rdkit version and 2020 version
There was a issue posted on rdkit github about this marked as solved in 2019, but I'm still having this kind of trouble.
The newer versions I've tried is listed here: mordred 1.2.0 pyhe5148d4_0 mordred-descriptor rdkit 2020.03.3.0 py37hc20afe1_1 rdkit
Some sample of what i'm talking about
VSA_EState1 VSA_EState2 VSA_EState3 ... VSA_EState7 VSA_EState8 VSA_EState9
0 0.000000 104.095806 19.619795 ... 2.745955 19.849813 4.408183 1 16.909374 13.415262 5.121789 ... 1.614642 4.129145 0.000000 2 11.307091 15.540997 4.334151 ... 4.226702 5.391886 0.000000 3 57.150084 12.272865 19.070186 ... 1.430556 0.000000 -7.333333 4 1.280530 23.899946 8.270205 ... 0.000000 0.139850 0.000000
Thank you all
Hello there, For instance, what if I just want the descriptors for EState with type='count'. It seems that specifying this parameter in the EState.AtomTypeEState is not enough and the estate parameter does not accept a list of all.
The list of descriptors that I'm trying to build can be more generic where I just pick the module or more specific like in this case.
Thank you so much for your help.
description
I want to be able to chose which Mordred descriptors are calculated. For this I read a list of strings for a text file. If the text file is empty I use all descriptors.
My code example is obviously wrong, but I want to solve it similarly to my script for the RDKit descriptors. I hope you can help me out!
minimal reproduction code
either getting a list (now minimal example)
descriptor_list = ['ABC', 'nAcid', 'MW', 'Zagreb1', 'WPath']
or using every mordred calculatormordred_descriptors = Calculator(descriptors, ignore_3D=True)
descriptor_list = list(mordred_descriptors._name_dict.keys())
calculator = Calculator(descriptor_list, ignore_3D=True)
df_descriptors_calc = calculator.pandas(molecules)
RDKit code equivalent
calculator = MoleculeDescriptors.MolecularDescriptorCalculator(descriptor_list)
descriptors_calc = np.asarray([calculator.CalcDescriptors(mol) for mol in molecules])
df_descriptors_calc = pd.DataFrame(descriptors_calc, columns=descriptor_list)
environment
OS/distribution
Mac OS 10.14.3
conda or pip
conda
python version
Python 3.6.8 :: Anaconda, Inc.
library version
mordred 1.1.1 py36h39e3cac_0 mordred-descriptor rdkit 2018.09.2 py36h4e14f70_0 conda-forge