xiaoruiDong / RDMC

Reaction Data and Molecular Conformers (RDMC) is a package dealing with reactions, molecules, conformers, majorly in 3D.
https://xiaoruidong.github.io/RDMC/
MIT License
21 stars 0 forks source link

Add a new module to calculate fingerprints using RDKit #71

Closed xiaoruiDong closed 9 months ago

xiaoruiDong commented 9 months ago

A new module was added to RDMC featurizer inspired by my recent work with chemprop featurizer. So far, only fingerprints supported by RDKit have been added, namely Morgan, atom pair, topological torsion, and RDKitFP. Hopefully, molgraph and condensed graph of reaction can be added in the future when I have time.

The addition enables a simple API call to query different fingerprints utilizing Chem.rdFingerprintGenerator.

P.S. I found there is another popular way of implementing fingerprint calculation, e.g.:

def get_morgan_fingerprint(mol, radius=2, n_bits=1024):
    features_vec = AllChem.GetHashedMorganFingerprint(mol, radius=radius, nBits=n_bits)
    features = np.zeros((1,))
    DataStructs.ConvertToNumpyArray(features_vec, features)
    return features

I did a quick comparison between the above implementation and the implementation using rdFingerprintGenerator; the previous one doesn't scale as well as the later one with increasing fpSize. The difference is neglectable for 1024, but the former one takes almost 2x time for 2048.