lulab / OligoFormer

GNU General Public License v3.0
14 stars 3 forks source link

OligoFormer

python >3.8.20

Gene silencing through RNA interference (RNAi) has emerged as a powerful tool for studying gene function and developing therapeutics[1]. Small interfering RNA (siRNA) molecules play a crucial role in RNAi by targeting specific mRNA sequences for degradation. Identifying highly efficient siRNA molecules is essential for successful gene silencing experiments and therapeutic applications. Built on the transformer architecture[2], OligoFormer can capture multi-dimensional features and learn complex patterns of siRNA-mRNA interactions for siRNA efficacy prediction.

Datasets

OligoFormer was trained on a dataset of mRNA and siRNA pairs with experimentally measured efficacy by Huesken et al[3]. The training data consisted of diverse mRNA sequences and corresponding siRNA molecules with known efficacies.

dataset siRNA number cell line
Huesken 2431 H1299
Reynolds 240 HEK293
Vickers 76 T24
Haborth 44 HeLa
Ui-Tei 62 HeLa
Khvorova 14 HEK293
Hiesh 108 HEK293T
Amarzguioui 46 Cos-1, HaCaT
Takayuki 702 HeLa

Model

OligoFormer_architecture

Installation

OligoFormer environment

Download the repository and create the environment of RNA-FM.

#Clone the OligoFormer repository from GitHub
git clone https://github.com/lulab/OligoFormer.git
cd ./OligoFormer
#Install the required dependencies
conda create -n oligoformer python=3.8*

RNA-FM environment

source 1: Download the packaged RNA-FM.

wget https://cloud.tsinghua.edu.cn/f/46d71884ee8848b3a958/?dl=1 -O RNA-FM.tar.gz
tar -zxvf RNA-FM.tar.gz

source 2: Create the environment of RNA-FM[4].

git clone https://github.com/ml4bio/RNA-FM.git
cd ./RNA-FM
conda env create --name RNA-FM -f environment.yml

Download pre-trained models from this gdrive link and place the pth files into the pretrained folder.

Usage

You should have at least an NVIDIA GPU and a driver on your system to run the training or inference.

1.Activate the created conda environment

source activate oligoformer
pip install -r requirements.txt

2.Model training

#The following command take ~60 min on a V100 GPU
python scripts/main.py --datasets Hu Mix --cuda 0 --learning_rate 0.0001 --batch_size 16 --epoch 200 --early_stopping 30

3.Model inference

3.1 Inference without off-target

Option 1: Input the fasta file of mRNA sequence (Traverse mRNA with 19nt window size).

python scripts/main.py --infer 1 --infer_fasta ./data/example.fa --infer_output ./result/

Option 2: Input the fasta files of the mRNA and specific siRNAs (only predict these specific siRNAs).

python scripts/main.py --infer 1 -i1 data/example.fa -i2 data/example_siRNA.fa

Option 3: Input the mRNA sequence manually.

python scripts/main.py --infer 2

3.2 Inference with off-target

Off-target pipeline

source 1: CPAN

cpan Statistics::Lite
cpan Bio::TreeIO
# You also need install Vienarna package and export the PATH, and adjust the perl5lib to your own path.
# You need provide the ORF and UTR fatsa of mRNA to predict the off-target effects. The order of the sequence needs to be consistent across both files. Refer to the example data.

source 2: Download

wget https://cloud.tsinghua.edu.cn/f/cab2afdf951140a48fec/?dl=1 -O PerlLib.zip
unzip PerlLib.zip
export PERL5LIB=$(pwd)/PerlLib:$PERL5LIB
python scripts/main.py --infer 1 --infer_fasta ./data/example.fa --infer_output ./result/ -off -tox

User-friendly Docker image

Docker

The Docker image simplifies the installation and setup process, making it easy for users to get started with OligoFormer without worrying about dependencies and environment configuration.

Prerequisites

Installation

  1. Pull the Docker Image:

    You just need to choose one source.

    source 1: DockerHub

    docker pull yilanbai/oligoformer:v1.0

    source 2: Aliyun

    docker pull registry.cn-hangzhou.aliyuncs.com/yilanbai/oligoformer:v1.0

    source 3: Tsinghua Cloud

    Download Link

  2. Run the Docker Container:

    docker run -it --name oligoformer-container -dt --restart unless-stopped yilanbai/oligoformer:v1.0 && docker exec -it oligoformer-container bash
  3. Access the OligoFormer Tool:

    Once inside the container, you can start using OligoFormer with the following command:

    
    oligoformer -h # help
    oligoformer # infer
    oligoformer -i 1 -i1 data/example.fa -i2 data/example_siRNA.fa # infer only interested siRNA(faster)
    oligoformer -off # infer with off-target prediction
    oligoformer -tox # infer with toxicity prediction
    oligoformer -off -tox # infer with off-target and toxicity prediction
    oligoformer -m 2 # mismatch input 19nt siRNA
    oligoformer -i 0 -t # test inter-dataset
    oligoformer -i 0 -s -t # test intra-dataset
    # We recommand you to run the following two commands on the patform with GPUs.
    oligoformer -i 0 # train inter-dataset
    oligoformer -i 0 -s # train intra-dataset

References

[1] Zamore, Phillip D., et al. "RNAi: double-stranded RNA directs the ATP-dependent cleavage of mRNA at 21 to 23 nucleotide intervals." cell 101.1 (2000): 25-33.

[2] Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).

[3] Huesken, D., Lange, J., Mickanin, C. et al. Design of a genome-wide siRNA library using an artificial neural network. Nat Biotechnol 23, 995–1001 (2005).

[4] Chen, Jiayang, et al. "Interpretable RNA foundation model from unannotated data for highly accurate RNA structure and function predictions." arXiv preprint arXiv:2204.00300 (2022).

License and Disclaimer

This tool is for research purpose and not approved for clinical use. The tool shall not be used for commercial purposes without permission.