mordred-descriptor / mordred

a molecular descriptor calculator
http://mordred-descriptor.github.io/documentation/master/
BSD 3-Clause "New" or "Revised" License
355 stars 95 forks source link

Question: Calculate only Mordred descriptors contained in list of strings #73

Open christophhillisch opened 5 years ago

christophhillisch commented 5 years ago

description

I want to be able to chose which Mordred descriptors are calculated. For this I read a list of strings for a text file. If the text file is empty I use all descriptors.

My code example is obviously wrong, but I want to solve it similarly to my script for the RDKit descriptors. I hope you can help me out!

minimal reproduction code

either getting a list (now minimal example) descriptor_list = ['ABC', 'nAcid', 'MW', 'Zagreb1', 'WPath'] or using every mordred calculator mordred_descriptors = Calculator(descriptors, ignore_3D=True) descriptor_list = list(mordred_descriptors._name_dict.keys())

calculator = Calculator(descriptor_list, ignore_3D=True) df_descriptors_calc = calculator.pandas(molecules)

RDKit code equivalent

calculator = MoleculeDescriptors.MolecularDescriptorCalculator(descriptor_list) descriptors_calc = np.asarray([calculator.CalcDescriptors(mol) for mol in molecules]) df_descriptors_calc = pd.DataFrame(descriptors_calc, columns=descriptor_list)

environment

OS/distribution

Mac OS 10.14.3

conda or pip

conda

python version

Python 3.6.8 :: Anaconda, Inc.

library version

mordred 1.1.1 py36h39e3cac_0 mordred-descriptor rdkit 2018.09.2 py36h4e14f70_0 conda-forge

philopon commented 5 years ago

Thank you for your question.

There are two options calculating subset of descriptors.

  1. (recommended) registering specific descriptors.
from mordred import Calculator, ABCIndex, AcidBase, Weight, ZagrebIndex, WienerIndex

calc = Calculator([
    ABCIndex,  # register all presets of descriptors in module (register ABC and ABCGG)
    AcidBase.AcidicGroupCount,  # register all presets of the descriptor (register nAcid)
    Weight.Weight(),  # register the descriptor (MW)
    ZagrebIndex.ZagrebIndex(1, 1),  # Zagreb1
    WienerIndex.WienerIndex(False),  # WPath
], ignore_3D=True)

print(len(calc.descriptors))  # 6
  1. filtering descriptors.

Option 1 is verbose, thus you can also filtering descriptor.

from mordred import Calculator, descriptors

descriptor_list = {'ABC', 'nAcid', 'MW', 'Zagreb1', 'WPath'}
calc = Calculator(descriptors, ignore_3D=True)  # register all descriptors
calc.descriptors = [d for d in calc.descriptors if str(d) in descriptor_list]  # re-register subset of descriptors

print(len(calc.descriptors))  # 5

Please feel free to ask if there are any parts that are unclear.

Thanks,

DCoupry commented 5 years ago

As a subset of @ChrisHill8 question, is it in any way possible to filter the "sub-descriptors"? e.g: only calculating the EState_VSA5 descriptor, not the full list of EStates?

philopon commented 5 years ago

@DCoupry yes, you can do it. Moreover, non-default descriptors (e.g. very large ring count (it may be useful in macro cyclic data set) etc.) is also able to calculate.

example:

from mordred import descriptor
from mordred.MoeType import EState_VSA
from mordred.RingCount import RingCount

calc = Calculator(EState_VSA(5))
print(len(calc.descriptors))  # 1
calc(mol)

ring18 = RingCount(order=18)  # 18-membered ring count descriptor
ring18(mol)  # descriptor instance can also call like a function

List of default descriptors are available in http://mordred-descriptor.github.io/documentation/master/descriptors.html.

Thanks,

naefl commented 5 years ago

@philopon Is there a version of option 2 that allows specifying only the modules, not submodules?

luizimgiolo commented 4 years ago

Hi there, I notice that VSA_EState (all) are outputting out of its range I tried with 2017 rdkit version and 2020 version

There was a issue posted on rdkit github about this marked as solved in 2019, but I'm still having this kind of trouble.

The newer versions I've tried is listed here: mordred 1.2.0 pyhe5148d4_0 mordred-descriptor rdkit 2020.03.3.0 py37hc20afe1_1 rdkit

Some sample of what i'm talking about

VSA_EState1  VSA_EState2  VSA_EState3  ...  VSA_EState7  VSA_EState8  VSA_EState9

0 0.000000 104.095806 19.619795 ... 2.745955 19.849813 4.408183 1 16.909374 13.415262 5.121789 ... 1.614642 4.129145 0.000000 2 11.307091 15.540997 4.334151 ... 4.226702 5.391886 0.000000 3 57.150084 12.272865 19.070186 ... 1.430556 0.000000 -7.333333 4 1.280530 23.899946 8.270205 ... 0.000000 0.139850 0.000000

Thank you all

brunocalcada commented 2 years ago

Hello there, For instance, what if I just want the descriptors for EState with type='count'. It seems that specifying this parameter in the EState.AtomTypeEState is not enough and the estate parameter does not accept a list of all.

The list of descriptors that I'm trying to build can be more generic where I just pick the module or more specific like in this case.

Thank you so much for your help.