Open tengfei opened 11 years ago
I also have this problem.
I have a BigWig file I would like to visualize, which shows the mapped read depth from an RNA-seq experiment. This file is one of the ENCODE rnaseq files. Importantly the bigwig only contains entries where there is expression. Bases with zero mapped depth are excluded.
I load a part of this up into R using rtracklayer's import command:
>gr = GRanges(seqnames=c("chr1"), IRanges(from=159500770, to=160000770))
>expres <- import("bigwig.bw", which = gr, asRangedData = F)
>expres
GRanges with 218180 ranges and 1 metadata column
Note that the window is 1Mb big, but my GRanges object only has 218180 entries
I now want to plot this, but if I do it straight off, then the resulting computation takes forever, and would eventaully return more data than can be visualized. So aggregate over windows:
>expression_plot = ggplot(expres, aes(y=score)) + stat_aggregate(aes(y=score),window=1000, geom="bar") + scale_y_log10() + ylab("Read\nDepth") + xlim(gr)
This plots the expected coverage plot at a suitable resolution.
The problem comes when I try to combine this with another track:
> gene_plot <- ggplot(genes) + geom_alignment(aes(group=gene_id, fill = strand, color=strand),range.geom="arrowrect", group.selfish = F, extend.size =50000) + xlim(gr)
> tracks(Genes=gene_plot,, Expression=expression_plot, heights=c(3,1)) + theme_alignment(border = F, grid = F) + xlim(gr)
Error in if (zero_range(range)) { : missing value where TRUE/FALSE needed
>traceback()
5: scales::expand_range(getLimits(grob)$ylim, mul = 0.05)
4: FUN(X[[2L]], ...)
3: lapply(dots[!fixs & ideo], function(grob) {
scales::expand_range(getLimits(grob)$ylim, mul = 0.05)
})
2: lapply(dots[!fixs & ideo], function(grob) {
scales::expand_range(getLimits(grob)$ylim, mul = 0.05)
})
1: tracks(Genes = gene_plot, Expression = exp_plot, heights = c(3,
1))
I've managed to trace the problem to the fact that stat_aggregate returns NA values if there are no features in a particular window. This NA then propagates through to the data of the layer, and so getLimits returns NA, which is a problem for expand_ranges, although plotting the grob on its own seems fine.
I note the patch above, but my R development skills arn't good enough to apply it myself to our system.
Hi Ian,
Thanks a lot for reporting the bug and sorry for the late reply, would you mind give me a small subset of test data and example code so I could produce the error on my machine and fix it as soon as possible.
cheers
Tengfei
On Fri, Jan 24, 2014 at 11:02 AM, Ian Sudbery notifications@github.comwrote:
I also have this problem.
I have a BigWig file I would like to visualize, which shows the mapped read depth from an RNA-seq experiment. This file is one of the ENCODE rnaseq files. Importantly the bigwig only contains entries where there is expression. Bases with zero mapped depth are excluded.
I load a part of this up into R using rtracklayer's import command:
gr = GRanges(seqnames=c("chr1"), IRanges(from=159500770, to=160000770)) expres <- import("bigwig.bw", which = gr, asRangedData = F) expres GRanges with 218180 ranges and 1 metadata column
Note that the window is 1Mb big, but my GRanges object only has 218180 entries
I now want to plot this, but if I do it straight off, then the resulting computation takes forever, and would eventaully return more data than can be visualized. So aggregate over windows:
expression_plot = ggplot(expres, aes(y=score)) + stat_aggregate(aes(y=score),window=1000, geom="bar") + scale_y_log10() + ylab("Read\nDepth") + xlim(gr)
This plots the expected coverage plot at a suitable resolution.
The problem comes when I try to combine this with another track:
gene_plot <- ggplot(genes) + geom_alignment(aes(group=gene_id, fill = strand, color=strand),range.geom="arrowrect", group.selfish = F, extend.size =50000) + xlim(gr)
tracks(Genes=gene_plot,, Expression=expression_plot, heights=c(3,1)) + theme_alignment(border = F, grid = F) + xlim(gr) Error in if (zero_range(range)) { : missing value where TRUE/FALSE needed
traceback() 5: scales::expand_range(getLimits(grob)$ylim, mul = 0.05) 4: FUN(X[[2L]], ...) 3: lapply(dots[!fixs & ideo], function(grob) { scales::expand_range(getLimits(grob)$ylim, mul = 0.05) }) 2: lapply(dots[!fixs & ideo], function(grob) { scales::expand_range(getLimits(grob)$ylim, mul = 0.05) }) 1: tracks(Genes = gene_plot, Expression = exp_plot, heights = c(3, 1))
I've managed to trace the problem to the fact that stat_aggregate returns NA values if there are no features in a particular window. This NA then propagates through to the data of the layer, and so getLimits returns NA, which is a problem for expand_ranges, although plotting the grob on its own seems fine.
I note the patch above, but my R development skills arn't good enough to apply it myself to our system.
Reply to this email directly or view it on GitHubhttps://github.com/tengfei/ggbio/issues/35#issuecomment-33234976 .
Hi tengfei,
I am away for the next three weeks. I will send you some test data when I return if it's still useful.
Ian
Sent from Samsung mobile on O2
-------- Original message -------- From: Tengfei Yin notifications@github.com Date: To: tengfei/ggbio ggbio@noreply.github.com Cc: Ian Sudbery ian.sudbery@dpag.ox.ac.uk Subject: Re: [ggbio] bug from jason (#35)
Hi Ian,
Thanks a lot for reporting the bug and sorry for the late reply, would you mind give me a small subset of test data and example code so I could produce the error on my machine and fix it as soon as possible.
cheers
Tengfei
On Fri, Jan 24, 2014 at 11:02 AM, Ian Sudbery notifications@github.comwrote:
I also have this problem.
I have a BigWig file I would like to visualize, which shows the mapped read depth from an RNA-seq experiment. This file is one of the ENCODE rnaseq files. Importantly the bigwig only contains entries where there is expression. Bases with zero mapped depth are excluded.
I load a part of this up into R using rtracklayer's import command:
gr = GRanges(seqnames=c("chr1"), IRanges(from=159500770, to=160000770)) expres <- import("bigwig.bw", which = gr, asRangedData = F) expres GRanges with 218180 ranges and 1 metadata column
Note that the window is 1Mb big, but my GRanges object only has 218180 entries
I now want to plot this, but if I do it straight off, then the resulting computation takes forever, and would eventaully return more data than can be visualized. So aggregate over windows:
expression_plot = ggplot(expres, aes(y=score)) + stat_aggregate(aes(y=score),window=1000, geom="bar") + scale_y_log10() + ylab("Read\nDepth") + xlim(gr)
This plots the expected coverage plot at a suitable resolution.
The problem comes when I try to combine this with another track:
gene_plot <- ggplot(genes) + geom_alignment(aes(group=gene_id, fill = strand, color=strand),range.geom="arrowrect", group.selfish = F, extend.size =50000) + xlim(gr)
tracks(Genes=gene_plot,, Expression=expression_plot, heights=c(3,1)) + theme_alignment(border = F, grid = F) + xlim(gr) Error in if (zero_range(range)) { : missing value where TRUE/FALSE needed
traceback() 5: scales::expand_range(getLimits(grob)$ylim, mul = 0.05) 4: FUN(X[[2L]], ...) 3: lapply(dots[!fixs & ideo], function(grob) { scales::expand_range(getLimits(grob)$ylim, mul = 0.05) }) 2: lapply(dots[!fixs & ideo], function(grob) { scales::expand_range(getLimits(grob)$ylim, mul = 0.05) }) 1: tracks(Genes = gene_plot, Expression = exp_plot, heights = c(3, 1))
I've managed to trace the problem to the fact that stat_aggregate returns NA values if there are no features in a particular window. This NA then propagates through to the data of the layer, and so getLimits returns NA, which is a problem for expand_ranges, although plotting the grob on its own seems fine.
I note the patch above, but my R development skills arn't good enough to apply it myself to our system.
Reply to this email directly or view it on GitHubhttps://github.com/tengfei/ggbio/issues/35#issuecomment-33234976 .
— Reply to this email directly or view it on GitHubhttps://github.com/tengfei/ggbio/issues/35#issuecomment-34101900.
Hi Tengfei,
I am now back in the office and can supply you with the require test data. Please find attached two files, features.gtf.gz and expression_subset.bw.
You can reproduce the error with the following commands
library(rtracklayer) library(GenomicRanges) library(ggbio)
gr = GRanges(seqnames=c("chr1"), IRanges(start= 159500770, end=160000770))
exprs = import("expression_subset.bw", asRangedData=F) expression_plot = ggplot(exprs, aes(y=score)) + stat_aggregate(aes(y=score), window = 1000, geom = "bar") + scale_y_log10() + ylab("Read\nDepth")
genes = import("../illumina1_4c_run3/features.gtf.gz", which = gr, asRangedData=F) gene_plot <- ggplot(genes) + geom_alignment(aes(group=gene_id, fill = strand, color=strand),range.geom="arrowrect", group.selfish = F, extend.size =50000) + xlim(gr)
tracks(Genes=gene_plot,Expression=expression_plot, heights=c(3,1)) + theme_alignment(border=F, grid = F) + xlim(gr)
Here is my sessionInfo()
R version 2.15.2 (2012-10-26) Platform: x86_64-unknown-linux-gnu (64-bit)
locale: [1] C
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] ggbio_1.6.6 ggplot2_0.9.3.1 rtracklayer_1.18.2 [4] GenomicRanges_1.10.7 IRanges_1.16.6 BiocGenerics_0.4.0
loaded via a namespace (and not attached): [1] AnnotationDbi_1.20.7 BSgenome_1.26.1 Biobase_2.18.0 [4] Biostrings_2.26.3 DBI_0.2-7 GenomicFeatures_1.10.2 [7] Hmisc_3.12-2 MASS_7.3-23 RColorBrewer_1.0-5 [10] RCurl_1.95-4.1 RSQLite_0.11.4 Rsamtools_1.10.2 [13] VariantAnnotation_1.4.12 XML_3.96-1.1 biomaRt_2.14.0 [16] biovizBase_1.6.2 bitops_1.0-6 cluster_1.14.4 [19] colorspace_1.2-2 dichromat_2.0-0 digest_0.6.3 [22] grid_2.15.2 gridExtra_0.9.1 gtable_0.1.2 [25] labeling_0.2 lattice_0.20-15 munsell_0.4.2 [28] parallel_2.15.2 plyr_1.8 proto_0.3-10 [31] reshape2_1.2.2 rpart_4.1-1 scales_0.2.3 [34] stats4_2.15.2 stringr_0.6.2 tools_2.15.2 [37] zlibbioc_1.4.0
Hope this makes sense,
EDIT: apparently I can't attach these files here. Download them at http://www.cgat.org/~ians/ggbio_bug/expression_subset.bw http://www.cgat.org/~ians/ggbio_bug/features.gtf.gz
On 02/04/14 20:15, Tengfei Yin wrote:
Hi Ian,
Thanks a lot for reporting the bug and sorry for the late reply, would you mind give me a small subset of test data and example code so I could produce the error on my machine and fix it as soon as possible.
cheers
Tengfei
On Fri, Jan 24, 2014 at 11:02 AM, Ian Sudbery notifications@github.comwrote:
I also have this problem.
I have a BigWig file I would like to visualize, which shows the mapped read depth from an RNA-seq experiment. This file is one of the ENCODE rnaseq files. Importantly the bigwig only contains entries where there is expression. Bases with zero mapped depth are excluded.
I load a part of this up into R using rtracklayer's import command:
gr = GRanges(seqnames=c("chr1"), IRanges(from=159500770, to=160000770)) expres <- import("bigwig.bw", which = gr, asRangedData = F) expres GRanges with 218180 ranges and 1 metadata column
Note that the window is 1Mb big, but my GRanges object only has 218180 entries
I now want to plot this, but if I do it straight off, then the resulting computation takes forever, and would eventaully return more data than can be visualized. So aggregate over windows:
expression_plot = ggplot(expres, aes(y=score)) + stat_aggregate(aes(y=score),window=1000, geom="bar") + scale_y_log10()
- ylab("Read\nDepth") + xlim(gr)
This plots the expected coverage plot at a suitable resolution.
The problem comes when I try to combine this with another track:
gene_plot <- ggplot(genes) + geom_alignment(aes(group=gene_id, fill = strand, color=strand),range.geom="arrowrect", group.selfish = F, extend.size =50000) + xlim(gr)
tracks(Genes=gene_plot,, Expression=expression_plot, heights=c(3,1)) + theme_alignment(border = F, grid = F) + xlim(gr) Error in if (zero_range(range)) { : missing value where TRUE/FALSE needed
traceback() 5: scales::expand_range(getLimits(grob)$ylim, mul = 0.05) 4: FUN(X[[2L]], ...) 3: lapply(dots[!fixs & ideo], function(grob) { scales::expand_range(getLimits(grob)$ylim, mul = 0.05) }) 2: lapply(dots[!fixs & ideo], function(grob) { scales::expand_range(getLimits(grob)$ylim, mul = 0.05) }) 1: tracks(Genes = gene_plot, Expression = exp_plot, heights = c(3, 1))
I've managed to trace the problem to the fact that stat_aggregate returns NA values if there are no features in a particular window. This NA then propagates through to the data of the layer, and so getLimits returns NA, which is a problem for expand_ranges, although plotting the grob on its own seems fine.
I note the patch above, but my R development skills arn't good enough to apply it myself to our system.
Reply to this email directly or view it on GitHubhttps://github.com/tengfei/ggbio/issues/35#issuecomment-33234976 .
— Reply to this email directly or view it on GitHub https://github.com/tengfei/ggbio/issues/35#issuecomment-34101900.
Hi Michael,
I found a bug in the bbgio function getLimits: NA values present in the y values give NA for both min and max values. This leads to an error when combining plots using tracks. Here's the line in particular:
res <- suppressWarnings(list(xlim = c(min(c(l.res$xmin, x, xmin)), max(c(l.res$xmax, x, xmax, xend))), ylim = c(min(c(l.res$ymin, y, ymin)), max(c(l.res$ymax, y, ymax, yend)))))
here's my fix:
res <- suppressWarnings(list(xlim = c(min(c(l.res$xmin, x, xmin), na.rm=TRUE), max(c(l.res$xmax, x, xmax, xend), na.rm=TRUE)), ylim = c(min(c(l.res$ymin, y, ymin), na.rm=TRUE), max(c(l.res$ymax, y, ymax, yend), na.rm=TRUE))))
There's also some unusual behavior when combining tracks when you set legend.position="none". I've attached an example plot. There's excess space on the right margin where the legend should be but isn't.
J