lawremi / ggbio

Grid and ggplot2 based visualization for biological data
111 stars 24 forks source link

bug from jason #35

Open tengfei opened 11 years ago

tengfei commented 11 years ago

Hi Michael,

I found a bug in the bbgio function getLimits: NA values present in the y values give NA for both min and max values. This leads to an error when combining plots using tracks. Here's the line in particular:

res <- suppressWarnings(list(xlim = c(min(c(l.res$xmin, x, xmin)), max(c(l.res$xmax, x, xmax, xend))), ylim = c(min(c(l.res$ymin, y, ymin)), max(c(l.res$ymax, y, ymax, yend)))))

here's my fix:

res <- suppressWarnings(list(xlim = c(min(c(l.res$xmin, x, xmin), na.rm=TRUE), max(c(l.res$xmax, x, xmax, xend), na.rm=TRUE)), ylim = c(min(c(l.res$ymin, y, ymin), na.rm=TRUE), max(c(l.res$ymax, y, ymax, yend), na.rm=TRUE))))

There's also some unusual behavior when combining tracks when you set legend.position="none". I've attached an example plot. There's excess space on the right margin where the legend should be but isn't.

J

IanSudbery commented 10 years ago

I also have this problem.

I have a BigWig file I would like to visualize, which shows the mapped read depth from an RNA-seq experiment. This file is one of the ENCODE rnaseq files. Importantly the bigwig only contains entries where there is expression. Bases with zero mapped depth are excluded.

I load a part of this up into R using rtracklayer's import command:

>gr = GRanges(seqnames=c("chr1"), IRanges(from=159500770, to=160000770))
>expres <- import("bigwig.bw", which = gr, asRangedData = F)
>expres
GRanges with 218180 ranges and 1 metadata column

Note that the window is 1Mb big, but my GRanges object only has 218180 entries

I now want to plot this, but if I do it straight off, then the resulting computation takes forever, and would eventaully return more data than can be visualized. So aggregate over windows:

>expression_plot = ggplot(expres, aes(y=score)) + stat_aggregate(aes(y=score),window=1000, geom="bar") + scale_y_log10() + ylab("Read\nDepth") + xlim(gr)

This plots the expected coverage plot at a suitable resolution.

The problem comes when I try to combine this with another track:

> gene_plot <- ggplot(genes) + geom_alignment(aes(group=gene_id, fill = strand, color=strand),range.geom="arrowrect", group.selfish = F, extend.size =50000) + xlim(gr)

> tracks(Genes=gene_plot,, Expression=expression_plot, heights=c(3,1)) + theme_alignment(border = F, grid = F) + xlim(gr)
Error in if (zero_range(range)) { : missing value where TRUE/FALSE needed

>traceback()
5: scales::expand_range(getLimits(grob)$ylim, mul = 0.05)
4: FUN(X[[2L]], ...)
3: lapply(dots[!fixs & ideo], function(grob) {
       scales::expand_range(getLimits(grob)$ylim, mul = 0.05)
   })
2: lapply(dots[!fixs & ideo], function(grob) {
       scales::expand_range(getLimits(grob)$ylim, mul = 0.05)
   })
1: tracks(Genes = gene_plot, Expression = exp_plot, heights = c(3, 
       1))

I've managed to trace the problem to the fact that stat_aggregate returns NA values if there are no features in a particular window. This NA then propagates through to the data of the layer, and so getLimits returns NA, which is a problem for expand_ranges, although plotting the grob on its own seems fine.

I note the patch above, but my R development skills arn't good enough to apply it myself to our system.

tengfei commented 10 years ago

Hi Ian,

Thanks a lot for reporting the bug and sorry for the late reply, would you mind give me a small subset of test data and example code so I could produce the error on my machine and fix it as soon as possible.

cheers

Tengfei

On Fri, Jan 24, 2014 at 11:02 AM, Ian Sudbery notifications@github.comwrote:

I also have this problem.

I have a BigWig file I would like to visualize, which shows the mapped read depth from an RNA-seq experiment. This file is one of the ENCODE rnaseq files. Importantly the bigwig only contains entries where there is expression. Bases with zero mapped depth are excluded.

I load a part of this up into R using rtracklayer's import command:

gr = GRanges(seqnames=c("chr1"), IRanges(from=159500770, to=160000770)) expres <- import("bigwig.bw", which = gr, asRangedData = F) expres GRanges with 218180 ranges and 1 metadata column

Note that the window is 1Mb big, but my GRanges object only has 218180 entries

I now want to plot this, but if I do it straight off, then the resulting computation takes forever, and would eventaully return more data than can be visualized. So aggregate over windows:

expression_plot = ggplot(expres, aes(y=score)) + stat_aggregate(aes(y=score),window=1000, geom="bar") + scale_y_log10() + ylab("Read\nDepth") + xlim(gr)

This plots the expected coverage plot at a suitable resolution.

The problem comes when I try to combine this with another track:

gene_plot <- ggplot(genes) + geom_alignment(aes(group=gene_id, fill = strand, color=strand),range.geom="arrowrect", group.selfish = F, extend.size =50000) + xlim(gr)

tracks(Genes=gene_plot,, Expression=expression_plot, heights=c(3,1)) + theme_alignment(border = F, grid = F) + xlim(gr) Error in if (zero_range(range)) { : missing value where TRUE/FALSE needed

traceback() 5: scales::expand_range(getLimits(grob)$ylim, mul = 0.05) 4: FUN(X[[2L]], ...) 3: lapply(dots[!fixs & ideo], function(grob) { scales::expand_range(getLimits(grob)$ylim, mul = 0.05) }) 2: lapply(dots[!fixs & ideo], function(grob) { scales::expand_range(getLimits(grob)$ylim, mul = 0.05) }) 1: tracks(Genes = gene_plot, Expression = exp_plot, heights = c(3, 1))

I've managed to trace the problem to the fact that stat_aggregate returns NA values if there are no features in a particular window. This NA then propagates through to the data of the layer, and so getLimits returns NA, which is a problem for expand_ranges, although plotting the grob on its own seems fine.

I note the patch above, but my R development skills arn't good enough to apply it myself to our system.

Reply to this email directly or view it on GitHubhttps://github.com/tengfei/ggbio/issues/35#issuecomment-33234976 .

IanSudbery commented 10 years ago

Hi tengfei,

I am away for the next three weeks. I will send you some test data when I return if it's still useful.

Ian

Sent from Samsung mobile on O2

-------- Original message -------- From: Tengfei Yin notifications@github.com Date: To: tengfei/ggbio ggbio@noreply.github.com Cc: Ian Sudbery ian.sudbery@dpag.ox.ac.uk Subject: Re: [ggbio] bug from jason (#35)

Hi Ian,

Thanks a lot for reporting the bug and sorry for the late reply, would you mind give me a small subset of test data and example code so I could produce the error on my machine and fix it as soon as possible.

cheers

Tengfei

On Fri, Jan 24, 2014 at 11:02 AM, Ian Sudbery notifications@github.comwrote:

I also have this problem.

I have a BigWig file I would like to visualize, which shows the mapped read depth from an RNA-seq experiment. This file is one of the ENCODE rnaseq files. Importantly the bigwig only contains entries where there is expression. Bases with zero mapped depth are excluded.

I load a part of this up into R using rtracklayer's import command:

gr = GRanges(seqnames=c("chr1"), IRanges(from=159500770, to=160000770)) expres <- import("bigwig.bw", which = gr, asRangedData = F) expres GRanges with 218180 ranges and 1 metadata column

Note that the window is 1Mb big, but my GRanges object only has 218180 entries

I now want to plot this, but if I do it straight off, then the resulting computation takes forever, and would eventaully return more data than can be visualized. So aggregate over windows:

expression_plot = ggplot(expres, aes(y=score)) + stat_aggregate(aes(y=score),window=1000, geom="bar") + scale_y_log10() + ylab("Read\nDepth") + xlim(gr)

This plots the expected coverage plot at a suitable resolution.

The problem comes when I try to combine this with another track:

gene_plot <- ggplot(genes) + geom_alignment(aes(group=gene_id, fill = strand, color=strand),range.geom="arrowrect", group.selfish = F, extend.size =50000) + xlim(gr)

tracks(Genes=gene_plot,, Expression=expression_plot, heights=c(3,1)) + theme_alignment(border = F, grid = F) + xlim(gr) Error in if (zero_range(range)) { : missing value where TRUE/FALSE needed

traceback() 5: scales::expand_range(getLimits(grob)$ylim, mul = 0.05) 4: FUN(X[[2L]], ...) 3: lapply(dots[!fixs & ideo], function(grob) { scales::expand_range(getLimits(grob)$ylim, mul = 0.05) }) 2: lapply(dots[!fixs & ideo], function(grob) { scales::expand_range(getLimits(grob)$ylim, mul = 0.05) }) 1: tracks(Genes = gene_plot, Expression = exp_plot, heights = c(3, 1))

I've managed to trace the problem to the fact that stat_aggregate returns NA values if there are no features in a particular window. This NA then propagates through to the data of the layer, and so getLimits returns NA, which is a problem for expand_ranges, although plotting the grob on its own seems fine.

I note the patch above, but my R development skills arn't good enough to apply it myself to our system.

Reply to this email directly or view it on GitHubhttps://github.com/tengfei/ggbio/issues/35#issuecomment-33234976 .

— Reply to this email directly or view it on GitHubhttps://github.com/tengfei/ggbio/issues/35#issuecomment-34101900.

IanSudbery commented 10 years ago

Hi Tengfei,

I am now back in the office and can supply you with the require test data. Please find attached two files, features.gtf.gz and expression_subset.bw.

You can reproduce the error with the following commands

library(rtracklayer) library(GenomicRanges) library(ggbio)

gr = GRanges(seqnames=c("chr1"), IRanges(start= 159500770, end=160000770))

exprs = import("expression_subset.bw", asRangedData=F) expression_plot = ggplot(exprs, aes(y=score)) + stat_aggregate(aes(y=score), window = 1000, geom = "bar") + scale_y_log10() + ylab("Read\nDepth")

genes = import("../illumina1_4c_run3/features.gtf.gz", which = gr, asRangedData=F) gene_plot <- ggplot(genes) + geom_alignment(aes(group=gene_id, fill = strand, color=strand),range.geom="arrowrect", group.selfish = F, extend.size =50000) + xlim(gr)

tracks(Genes=gene_plot,Expression=expression_plot, heights=c(3,1)) + theme_alignment(border=F, grid = F) + xlim(gr)

Here is my sessionInfo()

R version 2.15.2 (2012-10-26) Platform: x86_64-unknown-linux-gnu (64-bit)

locale: [1] C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] ggbio_1.6.6 ggplot2_0.9.3.1 rtracklayer_1.18.2 [4] GenomicRanges_1.10.7 IRanges_1.16.6 BiocGenerics_0.4.0

loaded via a namespace (and not attached): [1] AnnotationDbi_1.20.7 BSgenome_1.26.1 Biobase_2.18.0 [4] Biostrings_2.26.3 DBI_0.2-7 GenomicFeatures_1.10.2 [7] Hmisc_3.12-2 MASS_7.3-23 RColorBrewer_1.0-5 [10] RCurl_1.95-4.1 RSQLite_0.11.4 Rsamtools_1.10.2 [13] VariantAnnotation_1.4.12 XML_3.96-1.1 biomaRt_2.14.0 [16] biovizBase_1.6.2 bitops_1.0-6 cluster_1.14.4 [19] colorspace_1.2-2 dichromat_2.0-0 digest_0.6.3 [22] grid_2.15.2 gridExtra_0.9.1 gtable_0.1.2 [25] labeling_0.2 lattice_0.20-15 munsell_0.4.2 [28] parallel_2.15.2 plyr_1.8 proto_0.3-10 [31] reshape2_1.2.2 rpart_4.1-1 scales_0.2.3 [34] stats4_2.15.2 stringr_0.6.2 tools_2.15.2 [37] zlibbioc_1.4.0

Hope this makes sense,

Ian

EDIT: apparently I can't attach these files here. Download them at http://www.cgat.org/~ians/ggbio_bug/expression_subset.bw http://www.cgat.org/~ians/ggbio_bug/features.gtf.gz

On 02/04/14 20:15, Tengfei Yin wrote:

Hi Ian,

Thanks a lot for reporting the bug and sorry for the late reply, would you mind give me a small subset of test data and example code so I could produce the error on my machine and fix it as soon as possible.

cheers

Tengfei

On Fri, Jan 24, 2014 at 11:02 AM, Ian Sudbery notifications@github.comwrote:

I also have this problem.

I have a BigWig file I would like to visualize, which shows the mapped read depth from an RNA-seq experiment. This file is one of the ENCODE rnaseq files. Importantly the bigwig only contains entries where there is expression. Bases with zero mapped depth are excluded.

I load a part of this up into R using rtracklayer's import command:

gr = GRanges(seqnames=c("chr1"), IRanges(from=159500770, to=160000770)) expres <- import("bigwig.bw", which = gr, asRangedData = F) expres GRanges with 218180 ranges and 1 metadata column

Note that the window is 1Mb big, but my GRanges object only has 218180 entries

I now want to plot this, but if I do it straight off, then the resulting computation takes forever, and would eventaully return more data than can be visualized. So aggregate over windows:

expression_plot = ggplot(expres, aes(y=score)) + stat_aggregate(aes(y=score),window=1000, geom="bar") + scale_y_log10()

  • ylab("Read\nDepth") + xlim(gr)

This plots the expected coverage plot at a suitable resolution.

The problem comes when I try to combine this with another track:

gene_plot <- ggplot(genes) + geom_alignment(aes(group=gene_id, fill = strand, color=strand),range.geom="arrowrect", group.selfish = F, extend.size =50000) + xlim(gr)

tracks(Genes=gene_plot,, Expression=expression_plot, heights=c(3,1)) + theme_alignment(border = F, grid = F) + xlim(gr) Error in if (zero_range(range)) { : missing value where TRUE/FALSE needed

traceback() 5: scales::expand_range(getLimits(grob)$ylim, mul = 0.05) 4: FUN(X[[2L]], ...) 3: lapply(dots[!fixs & ideo], function(grob) { scales::expand_range(getLimits(grob)$ylim, mul = 0.05) }) 2: lapply(dots[!fixs & ideo], function(grob) { scales::expand_range(getLimits(grob)$ylim, mul = 0.05) }) 1: tracks(Genes = gene_plot, Expression = exp_plot, heights = c(3, 1))

I've managed to trace the problem to the fact that stat_aggregate returns NA values if there are no features in a particular window. This NA then propagates through to the data of the layer, and so getLimits returns NA, which is a problem for expand_ranges, although plotting the grob on its own seems fine.

I note the patch above, but my R development skills arn't good enough to apply it myself to our system.

Reply to this email directly or view it on GitHubhttps://github.com/tengfei/ggbio/issues/35#issuecomment-33234976 .

— Reply to this email directly or view it on GitHub https://github.com/tengfei/ggbio/issues/35#issuecomment-34101900.