mordred-descriptor / mordred

a molecular descriptor calculator
http://mordred-descriptor.github.io/documentation/master/
BSD 3-Clause "New" or "Revised" License
340 stars 91 forks source link

Can you provided a link to what these descriptors actually are? #91

Open ErikCVik opened 3 years ago

ErikCVik commented 3 years ago

For example, what are matrix aggregating methods and can you give a description for each one more elaborately? Is this similar to chemical fingerprints or not at all?

plkx commented 3 years ago

Start here:

https://www.rdkit.org/docs/GettingStartedInPython.html#list-of-available-descriptors

In essence, they are not fingerprints, since many molecular structures may have identical descriptor values (in some cases, infinite many molecules), which is known as descriptor degeneracy, whereas fingerprints are designed with intent of zero degeneracy.

As for "matrix aggregating:" Many molecular descriptors are invariant properties determined from structure-derived graphs, or other representations in the form of matrices. Matrices facilitate derivation of invariant properties. As somewhat of an illustration:

Consider any hydrocarbon, CnH#. Ignore the H atoms. Create a table (matrix) numbered 1 through n horizontally and vertically. The matrix must be square, and is of dimensions n x n. Different systems (rules) are used to populate the table, e.g. values = 1 at Ci, Cj when Ci is bonded to Cj, otherwise they are zero. 'Set the matrix = 0' (treat it as a determinant, which is a scalar value). The determinant can be expanded into a polynomial equation of nth-degree (characteristic equation). The roots of that equation are solved (the matrix values are coefficients for the system of linear equations). The result is n roots, which are eigenvalues, which are invariant. One well-known rule system (Huckel theory) for populating the matrix produces the atom/electron coefficients for molecular orbitals as a linear combination of atomic orbitals (LCAO).

Fundamentally, this is the procedure used for generating descriptors from matrices. One matrix may beget many varieties of additional matrices, transformations may be applied, relationships between the matrices evaluated, ….

Such descriptors may seem abstract or ethereal because of the apparent lack of physical meaning in relation to chemical properties. However, such a perspective is no more valid than the belief that eigenvalues we call quantum numbers reflect physical phenomena such as spin, or angular momentum.

Their value is not in some physical relation to a physical property or activity, but in their incorporation into models with sufficient prediction quality to make them useful.

All models are wrong, but some are useful.

Regards,

plkx

DocMinus commented 2 years ago

A bit late perhaps, but here the actual Mordred overview: http://mordred-descriptor.github.io/documentation/master/descriptors.html