Applying deep learning concepts from image detection and graph theory has greatly advanced protein-ligand binding affinity prediction, a challenge with enormous ramifications for both drug discovery and protein engineering. We build upon these advances by designing a novel deep learning architecture consisting of a 3-dimensional convolutional neural network utilizing channel-wise attention and two graph convolutional networks utilizing attention-based aggregation of node features. HAC-Net (Hybrid Attention-Based Convolutional Neural Network) obtains state-of-the-art results on the PDBbind v.2016 core set, the most widely recognized benchmark in the field. We extensively assess the generalizability of our model using multiple train-test splits, each of which maximizes differences between either protein structures, protein sequences, or ligand extended-connectivity fingerprints. Furthermore, we perform 10-fold cross-validation with a similarity cutoff between SMILES strings of ligands in the training and test sets, and also evaluate the performance of HAC-Net on lower-quality data. We envision that this model can be extended to a broad range of supervised learning problems related to structure-based biomolecular property prediction.
HAC-Net (Hybrid Attention-Based Convolutional Neural Network) is a novel deep learning architecture for protein-ligand binding affinity prediction consisting of a 3D-CNN utilizing channel-wise attention and two GCNs utilizing attention-based aggregation of node features. This combination achieves an optimal balance between the superior performance of our GCNs and the complementary learning style of our 3D-CNN. Furthermore, the inclusion of two architecturally-identical GCNs mitigates noise resulting from the inherently-stochastic nature of the training process. By incorporating multiple forms of attention with advanced concepts from CNN and GCN architectural design, we are able to demonstrate state-of-the-art performance on the PDBbind benchmark for protein-ligand binding affinity prediction, as well the ability to generalize to complexes unlike those used for training.
[Currently under revision]
All of the: 1) HDF files used for training, validation and testing 2) NPY files containing 3D-CNN extracted features 3) PT files containing model parameters 4) IPYNB files of tutorial notebooks for training and testing
can be found at: https://drive.google.com/drive/folders/1yB2voUxwzhrQRh0JXnOD3BzY8ZQrbgUK?usp=sharing
https://pubs.acs.org/doi/10.1021/acs.jcim.3c00251
https://arxiv.org/abs/2212.12440
https://pypi.org/project/HACNet/
in order to install the HACNet package, simply run:
pip install HACNet
Please feel free to reach out to us through either of the following emails if you have any questions or need any additional files:
gregory.kyro@yale.edu
rafi.brent@yale.edu