DeepB³P³: masked peptide transformer for low-data peptide drug discovery

Installation

Please see requirements.txt.

Datasets

Source	Total number	BBBPs	non-BBBPs
B3Pred Training set	2367	215	2152
B3Pred Testing set	592	54	538

Masking peptides for small data challenge

The size of drug discovery datasets can be extremely limited due to the high cost of the experiments (1,2). However, the training of modern neural networks typically requires large-scale high-quality data. In this paper, we introduce 'masked peptide' that can significantly overcome this issue (Fig. (A)).

Unlike other data augmentation methods, our masking peptide technique does not involve any substitution, insertion, or deletion, but it can significantly change the latent distribution, as follows.

Training

mkdir temp
python DeepB3P3.py \
    --train_path 'bbbp/d3_train_a1x8.txt' \
    --test_path 'bbbp/d3_test_a1x8.txt' \
    --result_path 'temp/d3_test.pred.txt' \
    --log_path 'temp/d3_test.txt.log' \
    --max_length 75 \
    --conv1_kernel 10 \
    --conv2_kernel 10 \
    --regCLASS --LR 0.001 --EVALUATE_ALL --NUM_EPOCHS 50

Or experiment with multiple magnitudes of data augmentation using a single script.

mkdir collect
bash run.sh

Analysis

Pretrained model files: Google Drive. Please download the file (163MB) and unzip to 'DeepB3P3/collect/8/max75'. Then follow the jupyter notebook 'DeepB3P3_Analysis.ipynb'.

Prediction

For prediction, you may run it online at google colab.

Reference

@article{ma2023prediction,
  title={A prediction model for blood-brain barrier penetrating peptides based on masked peptide transformers with dynamic routing},
  author={Ma, Chunwei and Wolfinger, Russ},
  journal={Briefings in Bioinformatics},
  volume={24},
  number={6},
  pages={bbad399},
  year={2023},
  publisher={Oxford University Press}
}

Please let me know if you have any questions about this research.

horsepurve / DeepB3P3

readme

DeepB³P³: masked peptide transformer for low-data peptide drug discovery

Installation

Datasets

Masking peptides for small data challenge

Training

Analysis

Prediction

Reference

horsepurve / DeepB3P3

readme

DeepB3P3: masked peptide transformer for low-data peptide drug discovery

Installation

Datasets

Masking peptides for small data challenge

Training

Analysis

Prediction

Reference

DeepB³P³: masked peptide transformer for low-data peptide drug discovery