kevinzhongxu / CasOligo

CasOligo_test
Other
3 stars 0 forks source link

CasOligo


CasOligo is a R package to identify the 20nt gRNA-target-site oligonucleotide sequence within the 18S rRNA gene for designing the taxon-specific gRNA, which is used for CRISPR-Cas Selective Amplicon Sequencing (CCSAS, Zhong et al., 2020) to assess the eukaryotic microbiome of hosts (e.g. metazoans, plant). Taxon-specific gRNA allows directing the Cas nuclease to cut specifically the 18S rRNA gene of desired hosts, but not that of protists and fungi. This results in a sequencing-library highly enriched in 18S amplicons from microeukayotes, allowing for high-resolution surveys of the taxonomic composition and structure of the eukaryotic microbes associated with the host. CCSAS provides a new and powerful way to obtain high-resolution taxonomic data for the eukaryotic microbiomes of plants, animals and other metazoa.

To facilitate the application of CCSAS, we identified gRNA-target-sites for almost all metazoan and metaphyta taxa that are currently available at SILVA (Quast et al., 2003), creating a gRNA-taxon-site database for researchers who want to apply to their own organisms for various purposes. Beyond that, the CasOligo package provides an oligonucleotide design function, Cas9.gRNA.oligo2 function, that can be used to design custom gRNA-target-sites for any gene for which the sequence is known and there is a reference database, including genes encoding other regions of 18S rRNA (e.g. 16S, 23S or ITS), or metabolic genes (e.g. COX1). Thus, CCSAS makes it possible to study the genetic diversity of any gene in complex systems, including those that are rare, by removing any sequence that would otherwise dominate the data. The sequence-specific removal of any amplicon has a wide range of applications, including pathogen diagnosis, and studies of symbiosis and microbiome therapy.

   

Fig.1 Distribution of the number of sgRNA-target-sites across metazoans and plant taxa for designing taxon-specific and CRISPR-Cas9 compatible gRNA.

   

Features


   

How does the algorithm work?


   

Installation


To install the latest version from GitHub, simply run the following from an R console:

if (!require("devtools"))
  install.packages("devtools")
devtools::install_github("kevinzhongxu/CasOligo")

     

Dependancy


This package depends on the pre-installation of following R package:

   

Citation


 

If you use CasOligo in a publication, please cite our article in here:

Zhong KX, Cho A, Deeg CM, Chan AM & Suttle CA. (2021) The use of CRISPR-Cas Selective Amplicon Sequencing (CCSAS) to reveal the eukaryotic microbiome of metazoans. Microbiome 9, 230. https://doi.org/10.1186/s40168-021-01180-0.

   

Get start


 

Example 1: Design the 20nt gRNA-target-site oligonucleotide

This is an example to design the 20nt gRNA-target-site oligonucleotide for gRNA of CRISPR-cas9 system to cut the 18S rRNA gene of host, but not of protists and fungi

#If you aim to cut the 18S rRNA gene of the host at V4 region that is flanked by primer set, TAReuk454FWD1 and TAReukREV3 (Stoeck et al., 2010), please use this cas9.gRNA.oligo1 function as it based on the reference database of that region.
cas9.gRNA.oligo1(inseq="Path/To/Your/Input_sequence_fasta_file.fasta", target="Taxonomic_group_of_a_host")

#If you do NOT want to predict the gRNA's target range among a host taxonomic group.
cas9.gRNA.oligo1(inseq="Path/To/Your/Input_sequence_fasta_file.fasta")

#If your input fasta file is with more than one sequence and you want to check the target range of host among these sequences and among these and all related sequences from SILVA.
cas9.gRNA.oligo1m(inseq="Path/To/Your/Input_sequence_fasta_file.fasta", target="Taxonomic_group_of_a_host")
cas9.gRNA.oligo1m(inseq="Path/To/Your/Input_sequence_fasta_file.fasta")

#If you aim to target another region of the 18S rRNA gene that is amplified by different primers, or any other genes, please use cas9.gRNA.oligo2() function and you need to generate your own reference database. 
cas9.gRNA.oligo2(inseq="/home/kevin/Desktop/data/human.fasta", refseq="Path/To/Your/Reference_database_file.fasta", target="Homo_sapiens")
cas9.gRNA.oligo2(inseq="/home/kevin/Desktop/data/human.fasta", refseq="Path/To/Your/Reference_database_file.fasta")

 

Example 2: Design the 20nt gRNA-target-site oligonucleotide for 18S sequence of pacific oyster

This is an example to design the 20nt gRNA-target-site oligonucleotide for gRNA of CRISPR-cas9 system to cut the 18S rRNA gene of pacific oyster Crassostrea gigas, but not of protists and fungi.


#First, we obtain the link for the 18S sequence of pacific oyster in fasta format (V4 region of the 18S rRNA gene flanked by the primers, TAReuk454FWD1 and TAReukREV3)
input_fasta_file <- system.file("extdata", "pacific_oyster_18S_V4.fasta", package = "CasOligo")

#To design gRNA for the oyster 18S sequence and predict the sgRNA's target-range among other "Crassostrea_gigas" sequences in SILVA.
cas9.gRNA.oligo1(inseq=input_fasta_file, target="Crassostrea_gigas")

#To design gRNA for the oyster 18S sequence and predict the sgRNA's target-range among other "Ostreidae" sequences in SILVA.
cas9.gRNA.oligo1(inseq=input_fasta_file, target="Ostreidae")

#To design gRNA for the oyster 18S sequence and predict the sgRNA's target-range among other "Mollusca" sequences in SILVA.
cas9.gRNA.oligo1(inseq=input_fasta_file, target="Mollusca")

#To design gRNA for the oyster 18S sequence, but if you do not want to predict the sgRNA's target-range among other taxonomic groups.
cas9.gRNA.oligo1(inseq=input_fasta_file)

 

Example 3: Retrieve the 20nt gRNA-target-site oligonucleotide sequence from database

We already made a database of gRNA-target-sites (Zhong et al., 2020) for almost all metazoans and plant species that are available in SILVA (Quast et al., 2003).

If you have an idea on which host taxon to cut and its name, then you can use search.db.byname function to retrieve the oligo.

#To sucessuffly search a database, the name of taxon should be same as Silva database
search.db.byname(query="Host_species or Host_taxonomic group", cas="Name_of_Cas")

search.db.byname(query="Homo sapiens", cas="Cas9")
search.db.byname(query="Salmon", cas="Cas9")
search.db.byname(query="Mollusca", cas="Cas9")
search.db.byname(query="Crassostrea gigas", cas="Cas9")

search.db.byname(query="Homo sapiens", cas="Cas12a")
search.db.byname(query="Salmon", cas="Cas12a")
search.db.byname(query="Mollusca", cas="Cas12a")
search.db.byname(query="Crassostrea gigas", cas="Cas12a")

If you want to know more details of of this gRNA-target-site, then you can use search.db.byid function as follows.

search.db.byid(query="ID_of_the_gRNA-target-site", cas="Name_of_Cas")

search.db.byid(query="probe_022593", cas="Cas9")

 

References


Quast, C. et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41, D590–D596 (2013).

Stoeck, T. et al. Multiple marker parallel tag environmental DNA sequencing reveals a highly complex eukaryotic community in marine anoxic water. Mol. Ecol. 19, 21–31 (2010).

Zhong KX, Cho A, Deeg CM, Chan AM & Suttle CA. (2021) The use of CRISPR-Cas Selective Amplicon Sequencing (CCSAS) to reveal the eukaryotic microbiome of metazoans. Microbiome 9, 230. https://doi.org/10.1186/s40168-021-01180-0.

 

License


 

This work is subject to the MIT License.

   

 


A work by Kevin Xu ZHONG

xzhong@eoas.ubc.ca