Logolas is an R package for Enrichment Depletion Logo plots with string symbols, that highlights both enrichment and depletion of symbols, as opposed to standard logo plots, as in seqLogo package, that are biased towards highlighting enrichments. Logolas also generalizes logo plots to use both characters and strings.
If you find a bug, please create an issue.
This code has been tested in ...
<img src="utils/figures/misc2.png" alt="misc" height="400" width="700" align = "middle">
Copyright (c) 2018-2019, Kushal Dey.
All source code and software in this repository are made available under the terms of the GNU General Public License. See the LICENSE file for the full text of the license.
If you find that this R package is useful for your work, please cite our paper which is out on BMC Bioinformatics:
Dey, K.K., Xie, D. and Stephens, M., 2018. A new sequence logo plot to highlight enrichment and depletion. BMC Bioinformatics. 19:473 https://doi.org/10.1186/s12859-018-2489-3.
The most recent version of Logolas is available from Github using devtools R package.First, you would require to install the following Bioconductor packages.
source("https://bioconductor.org/biocLite.R")
biocLite(c("Biostrings","BiocStyle","Biobase","seqLogo","ggseqlogo"))
Then install Logolas as follows
library(devtools)
install_github("kkdey/Logolas",build_vignettes = TRUE)
Once you have installed the package, load the package in R by entering
library(Logolas)
To get an overview of the package, enter
help(package = "Logolas")
Next, try creating a few plots using the logomaker
function:
Create a standard Logo plot in Logolas, analogous to seqLogo
and
ggseqLogo
R packages.
sequence <- c("CTATTGT","CTCTTAT","CTATTAA","CTATTTA", "CTATTAT","CTTGAAT",
"CTTAGAT","CTATTAA","CTATTTA","CTATTAT", "CTTTTAT","CTATAGT",
"CTATTTT","CTTATAT","CTATATT","CTCATTT", "CTTATTT","CAATAGT",
"CATTTGA","CTCTTAT","CTATTAT","CTTTTAT", "CTATAAT","CTTAGGT",
"CTATTGT","CTCATGT","CTATAGT", "CTCGTTA","CTAGAAT","CAATGGT")
logomaker(sequence,type = "Logo")
The corresponding EDLogo plot highlights the depletion of T in the middle, not visually clear in the standard logo plot.
logomaker(sequence, type = "EDLogo")
One can also apply EDLogo for amino acid motifs, marked by alphabets beyond A, C, G and T as in DNA motifs.
We create an EDLogo plot on the amino acid sequences at N-Glycosylation sites, with a user specified
background bg
chosen to be the median psoitional weight of an aminoa acid in the context around the
glycosylation site [data from Uniprotkb].
data("N_Glycosyl_sequences")
bg <- apply(N_Glycosyl_sequences, 1, function(x) return(median(x)))
bg <- bg/sum(bg)
logomaker(N_Glycosyl_sequences, type = "EDLogo", bg=bg)
EDLogo highlights the motif Asn (N) -X- Ser (S)/Thr (T) -X motif at the center where X is depleted for the amino acid Pro (P).
Logolas allows the symbols in the logo plot to be a combination of strings and charcaters or be purely strings - examples of which are shown below
For a mutation signature (mismatch type at the center with flanking bases) example (data from Shiraishi et al 2015).
data(mutation_sig)
logomaker(mutation_sig, type = "EDLogo", color_type = "per_symbol", color_seed = 2000)
EDLogo plot for the enrichment and depletion of histone marks in different parts of the genome (data from Koch et al 2007).
data(histone_marks)
logomaker(histone_marks$mat, bg = histone_marks$bgmat, type = "EDLogo")
<img src="utils/figures/fig4.png" alt="misc" height="200" width="400" align = "middle">
Finally, please walk through some more detailed examples in the vignette:
vignette("Logolas")
This was the R command used to generate the vignette PDF file from the R Markdown source:
render("Logolas.Rmd",output_format="pdf_document")
This software was developed by Kushal Dey, Dongyue Xie and Matthew Stephens at the University of Chicago. For any questions or comments, please contact Kushal Dey at kkdey@uchicago.edu.
The authors would like to acknowledge Oliver Bembom, the author of the
seqLogo
package which acted as an inspiration and starting point for this
software. The authors also thank Peter Carbonetto, Edward Wallace and John Blischak
for helpful discussions and feedback.