Gene silencing through RNA interference (RNAi) has emerged as a powerful tool for studying gene function and developing therapeutics[1]. Small interfering RNA (siRNA) molecules play a crucial role in RNAi by targeting specific mRNA sequences for degradation. Identifying highly efficient siRNA molecules is essential for successful gene silencing experiments and therapeutic applications. Built on the transformer architecture[2], OligoFormer can capture multi-dimensional features and learn complex patterns of siRNA-mRNA interactions for siRNA efficacy prediction.
OligoFormer was trained on a dataset of mRNA and siRNA pairs with experimentally measured efficacy by Huesken et al[4]. The training data consisted of diverse mRNA sequences and corresponding siRNA molecules with known efficacies.
dataset | siRNA number | cell line |
---|---|---|
Huesken | 2431 | H1299 |
Reynolds | 240 | HEK293 |
Vickers | 76 | T24 |
Haborth | 44 | HeLa |
Ui-Tei | 62 | HeLa |
Khvorova | 14 | HEK293 |
Hiesh | 108 | HEK293T |
Amarzguioui | 46 | Cos-1, HaCaT |
Takayuki | 702 | HeLa |
Download the repository and create the environment of RNA-FM.
#Clone the OligoFormer repository from GitHub
git clone https://github.com/lulab/OligoFormer.git
cd ./OligoFormer
#Install the required dependencies
conda env create -n oligoformer -f environment.yml
Download the repository and create the environment of RNA-FM.
git clone https://github.com/ml4bio/RNA-FM.git
cd ./RNA-FM
conda env create --name RNA-FM -f environment.yml
Download pre-trained models from this gdrive link and place the pth files into the pretrained
folder.
You should have at least an NVIDIA GPU and a driver on your system to run the training or inference.
source activate oligoformer
#The following command take ~30 min on a V100 GPU
python scripts/main.py --datasets Hu new --cuda 0 --learning_rate 0.0001 --batch_size 16 --epoch 100 --early_stopping 30
python scripts/main.py --infer 1 --infer_fasta ./data/example.fa --infer_output ./result/
pos sense siRNA efficacy
0 UGAAUUUUUGUCAGAUAAA UUUAUCUGACAAAAAUUCA 0.9139741711020469
1 GAAUUUUUGUCAGAUAAAU AUUUAUCUGACAAAAAUUC 0.8864658409953117
2 AAUUUUUGUCAGAUAAAUA UAUUUAUCUGACAAAAAUU 0.815981000483036
3 AUUUUUGUCAGAUAAAUAA UUAUUUAUCUGACAAAAAU 0.8179122650027275
4 UUUUUGUCAGAUAAAUAAA UUUAUUUAUCUGACAAAAA 0.7880132337212562
5 UUUUGUCAGAUAAAUAAAA UUUUAUUUAUCUGACAAAA 0.7990648913383483
6 UUUGUCAGAUAAAUAAAAU AUUUUAUUUAUCUGACAAA 0.7055106237530708
7 UUGUCAGAUAAAUAAAAUA UAUUUUAUUUAUCUGACAA 0.7850472775697708
8 UGUCAGAUAAAUAAAAUAA UUAUUUUAUUUAUCUGACA 0.8157202693819999
9 GUCAGAUAAAUAAAAUAAA UUUAUUUUAUUUAUCUGAC 0.8842068641781807
# pos: start position of siRNA at mRNA
# sense: sense strand sequence, complimentary to siRNA
# siRNA: siRNA sequence
# efficacy: The predicted efficacy of siRNA
cpan Statistics::Lite
cpan Bio::TreeIO
# You also need install Vienarna package and export the PATH, and adjust the perl5lib to your own path.
# You need provide the ORF and UTR fatsa of mRNA to predict the off-target effects. The order of the sequence needs to be consistent across both files. Refer to the example data.
cd ./PITA && make
cd ..
python scripts/main.py --infer 1 --infer_fasta ./data/example.fa --infer_output ./result/ --offtarget True --ORF_fasta ./data/ORF_example.fa --UTR_fasta ./data/UTR_example.fa
pos,sense,siRNA,efficacy,PITA_Score,Scaled_PITA_Score,PheLiM_Score,Scaled_PheLiM_Score,off_target_score,Immune_Motif,evaluation_score
0,UGAAUUUUUGUCAGAUAAA,UUUAUCUGACAAAAAUUCA,0.9139741711020468,-10.07,1.0,0.28300000000000003,0.2866379310344829,0.671262782443687,0,0.5351767612351033
1,GAAUUUUUGUCAGAUAAAU,AUUUAUCUGACAAAAAUUC,0.8864658409953117,-8.17,0.6207584830339321,0.15,0.0,0.0,0,1.0
2,AAUUUUUGUCAGAUAAAUA,UAUUUAUCUGACAAAAAUU,0.8159810004830359,-8.96,0.7784431137724552,0.23399999999999999,0.1810344827586207,0.4425563810483377,0,0.4636817856550889
3,AUUUUUGUCAGAUAAAUAA,UUAUUUAUCUGACAAAAAU,0.8179122650027275,-7.7,0.5269461077844312,0.5660000000000001,0.8965517241379312,1.0,0,0.20202874870817145
4,UUUUUGUCAGAUAAAUAAA,UUUAUUUAUCUGACAAAAA,0.7880132337212562,-7.4,0.4670658682634731,0.368,0.4698275862068966,0.705735069085299,0,0.26680191635745165
5,UUUUGUCAGAUAAAUAAAA,UUUUAUUUAUCUGACAAAA,0.7990648913383483,-7.58,0.502994011976048,0.38,0.49568965517241387,0.7522461644008175,0,0.26754605953645344
6,UUUGUCAGAUAAAUAAAAU,AUUUUAUUUAUCUGACAAA,0.7055106237530708,-7.63,0.5129740518962076,0.5720000000000001,0.9094827586206898,0.9882482133646434,0,0.0
7,UUGUCAGAUAAAUAAAAUA,UAUUUUAUUUAUCUGACAA,0.7850472775697708,-7.37,0.46107784431137727,0.614,1.0,0.9508590794451448,0,0.15983727224740343
8,UGUCAGAUAAAUAAAAUAA,UUAUUUUAUUUAUCUGACA,0.8157202693819999,-6.53,0.2934131736526947,0.61,0.9913793103448276,0.682184317952127,0,0.33302818248637234
9,GUCAGAUAAAUAAAAUAAA,UUUAUUUUAUUUAUCUGAC,0.8842068641781807,-5.06,0.0,0.19899999999999998,0.10560344827586204,0.0,0,0.9941495828005047
# pos: start position of siRNA at mRNA
# sense: sense strand sequence, complimentary to siRNA
# siRNA: siRNA sequence
# efficacy: The predicted efficacy of siRNA
# PITA_Score: The PITA score of siRNA
# Scaled_PITA_Score: The scaled PITA score of siRNA (0-1)
# PheLiM_Score: The PheLiM score of siRNA
# Scaled_PheLiM_Score: The scaled PheLiM score of siRNA (0-1)
# off_target_score: The off-target score of siRNA, which is the harmonic average of Scaled_PITA_Score and Scaled_PheLiM_Score
# Immune_Motif: The immune induced motif number of siRNA
# evaluation_score: The evaluation score of siRNA, which is calculated by efficacy and off_target_score
The Docker image simplifies the installation and setup process, making it easy for users to get started with OligoFormer without worrying about dependencies and environment configuration.
Pull the Docker Image:
docker pull yilanbai/oligoformer:v1.0
docker pull registry.cn-hangzhou.aliyuncs.com/yilanbai/oligoformer:v1.0 # another source if you fail to pull from the first source
Run the Docker Container:
docker run -it --name oligoformer-container -dt --restart unless-stopped yilanbai/oligoformer:v1.0 && docker exec -it oligoformer-container bash
Access the OligoFormer Tool:
Once inside the container, you can start using OligoFormer with the following command:
oligoformer -h # help
oligoformer # infer
oligoformer -o # infer with off-target prediction
oligoformer -to # infer with toxicity prediction
oligoformer -o -to # infer with off-target and toxicity prediction
oligoformer -m 2 # mismatch input 19nt siRNA
oligoformer -i 0 -t # test inter-dataset
oligoformer -i 0 -s -t # test intra-dataset
# We recommand you to run the following two commands on the patform with GPUs.
oligoformer -i 0 # train inter-dataset
oligoformer -i 0 -s # train intra-dataset
This tool is for research purpose and not approved for clinical use. The tool shall not be used for commercial purposes without permission.