horsepurve / DeepB3P3

Masked peptides for low-data peptide drug discovery (BiB 2023)
MIT License
3 stars 0 forks source link
data-augmentation deep-learning drug-discovery peptides uncertainty

DeepB3P3: masked peptide transformer for low-data peptide drug discovery

Installation

Please see requirements.txt.

Datasets

Source Total number BBBPs non-BBBPs
B3Pred Training set 2367 215 2152
B3Pred Testing set 592 54 538

Masking peptides for small data challenge

The size of drug discovery datasets can be extremely limited due to the high cost of the experiments (1,2). However, the training of modern neural networks typically requires large-scale high-quality data. In this paper, we introduce 'masked peptide' that can significantly overcome this issue (Fig. (A)).

Unlike other data augmentation methods, our masking peptide technique does not involve any substitution, insertion, or deletion, but it can significantly change the latent distribution, as follows.

Training

mkdir temp
python DeepB3P3.py \
    --train_path 'bbbp/d3_train_a1x8.txt' \
    --test_path 'bbbp/d3_test_a1x8.txt' \
    --result_path 'temp/d3_test.pred.txt' \
    --log_path 'temp/d3_test.txt.log' \
    --max_length 75 \
    --conv1_kernel 10 \
    --conv2_kernel 10 \
    --regCLASS --LR 0.001 --EVALUATE_ALL --NUM_EPOCHS 50

Or experiment with multiple magnitudes of data augmentation using a single script.

mkdir collect
bash run.sh

Analysis

Pretrained model files: Google Drive. Please download the file (163MB) and unzip to 'DeepB3P3/collect/8/max75'. Then follow the jupyter notebook 'DeepB3P3_Analysis.ipynb'.

Prediction

For prediction, you may run it online at google colab.

Reference

@article{ma2023prediction,
  title={A prediction model for blood-brain barrier penetrating peptides based on masked peptide transformers with dynamic routing},
  author={Ma, Chunwei and Wolfinger, Russ},
  journal={Briefings in Bioinformatics},
  volume={24},
  number={6},
  pages={bbad399},
  year={2023},
  publisher={Oxford University Press}
}

Please let me know if you have any questions about this research.