Please see requirements.txt.
Source | Total number | BBBPs | non-BBBPs |
---|---|---|---|
B3Pred Training set | 2367 | 215 | 2152 |
B3Pred Testing set | 592 | 54 | 538 |
The size of drug discovery datasets can be extremely limited due to the high cost of the experiments (1,2). However, the training of modern neural networks typically requires large-scale high-quality data. In this paper, we introduce 'masked peptide' that can significantly overcome this issue (Fig. (A)).
Unlike other data augmentation methods, our masking peptide technique does not involve any substitution, insertion, or deletion, but it can significantly change the latent distribution, as follows.
mkdir temp
python DeepB3P3.py \
--train_path 'bbbp/d3_train_a1x8.txt' \
--test_path 'bbbp/d3_test_a1x8.txt' \
--result_path 'temp/d3_test.pred.txt' \
--log_path 'temp/d3_test.txt.log' \
--max_length 75 \
--conv1_kernel 10 \
--conv2_kernel 10 \
--regCLASS --LR 0.001 --EVALUATE_ALL --NUM_EPOCHS 50
Or experiment with multiple magnitudes of data augmentation using a single script.
mkdir collect
bash run.sh
Pretrained model files: Google Drive. Please download the file (163MB) and unzip to 'DeepB3P3/collect/8/max75'. Then follow the jupyter notebook 'DeepB3P3_Analysis.ipynb'.
For prediction, you may run it online at google colab.
@article{ma2023prediction,
title={A prediction model for blood-brain barrier penetrating peptides based on masked peptide transformers with dynamic routing},
author={Ma, Chunwei and Wolfinger, Russ},
journal={Briefings in Bioinformatics},
volume={24},
number={6},
pages={bbad399},
year={2023},
publisher={Oxford University Press}
}
Please let me know if you have any questions about this research.