Open ErikCVik opened 3 years ago
Start here:
https://www.rdkit.org/docs/GettingStartedInPython.html#list-of-available-descriptors
In essence, they are not fingerprints, since many molecular structures may have identical descriptor values (in some cases, infinite many molecules), which is known as descriptor degeneracy, whereas fingerprints are designed with intent of zero degeneracy.
As for "matrix aggregating:" Many molecular descriptors are invariant properties determined from structure-derived graphs, or other representations in the form of matrices. Matrices facilitate derivation of invariant properties. As somewhat of an illustration:
Consider any hydrocarbon, CnH#. Ignore the H atoms. Create a table (matrix) numbered 1 through n horizontally and vertically. The matrix must be square, and is of dimensions n x n. Different systems (rules) are used to populate the table, e.g. values = 1 at Ci, Cj when Ci is bonded to Cj, otherwise they are zero. 'Set the matrix = 0' (treat it as a determinant, which is a scalar value). The determinant can be expanded into a polynomial equation of nth-degree (characteristic equation). The roots of that equation are solved (the matrix values are coefficients for the system of linear equations). The result is n roots, which are eigenvalues, which are invariant. One well-known rule system (Huckel theory) for populating the matrix produces the atom/electron coefficients for molecular orbitals as a linear combination of atomic orbitals (LCAO).
Fundamentally, this is the procedure used for generating descriptors from matrices. One matrix may beget many varieties of additional matrices, transformations may be applied, relationships between the matrices evaluated, ….
Such descriptors may seem abstract or ethereal because of the apparent lack of physical meaning in relation to chemical properties. However, such a perspective is no more valid than the belief that eigenvalues we call quantum numbers reflect physical phenomena such as spin, or angular momentum.
Their value is not in some physical relation to a physical property or activity, but in their incorporation into models with sufficient prediction quality to make them useful.
All models are wrong, but some are useful.
Regards,
plkx
A bit late perhaps, but here the actual Mordred overview: http://mordred-descriptor.github.io/documentation/master/descriptors.html
For example, what are matrix aggregating methods and can you give a description for each one more elaborately? Is this similar to chemical fingerprints or not at all?