Plot base level coverage

lcolladotor commented 7 years ago

I've received feedback from users interested in being able to plot the base-level coverage for a set of samples in recount2.

One way of doing this is via derfinderPlot::plotRegionCoverage(). I could write a function where the user specifies:

(1) a list of sample ids (run for SRA and GTEx, gdc_file_id for TCGA),
(2) a GRanges with the regions of interest (which would have to be short -- or the user would be warned) and
(3) colors and labels for each sample (ideally representing different groups).

Then internally run derfinder::loadCoverage() accessing the data from either local bigwigs (if at JHPCE, SciServer or if the user downloaded them) or via the web. Something that could be a bit of a bottle neck is the annotation needed for derfinderPlot::plotRegionCoverage(). That would normally involve creating a genomic state object from Gencode v25 hg38 which takes time even for one chromosome. I could pre-make them split by chr and have the function download the files from the web behind the scenes. In any case, right now this approach would have to deal with https://github.com/rafalab/bumphunter/issues/15 since derfinderPlot::plotRegionCoverage() requires the output of bumphunter::matchGenes().

Another option would be to open the data in the UCSC genome browser or something like that.

A third option could be to use epivizr http://bioconductor.org/packages/release/bioc/vignettes/epivizr/inst/doc/IntroToEpivizr.html. Although I'm not sure that it can read bigwig files from the web.

Thoughts @andrewejaffe @jtleek ?

Right now, users can get the list of URLs from recount::recount_url and load them into IGV or similar programs.

jtleek commented 7 years ago

I think we should talk to Hector Corrada Bravo about getting epivzr to do this - I don't think we should write a function here. I think we should write a tutorial. @lcolladotor

lcolladotor commented 7 years ago

I'm going to see what I can do with https://twitter.com/hcorrada/status/844896126578970625.

lcolladotor commented 7 years ago

Here's what I could get working with epivizr:

Code at https://gist.github.com/lcolladotor/d02064583733754dea7828a984378c96.

Some notes:

The epivizr window on Google Chrome didn't show the menus well.
The print function to save a screenshot in pdf format didn't work (only tried it from Firefox).
The default gene annotation in epivizr seems to be from hg19.
If the user moves the window, they'll have to re-import the coverage data and update the charts in epivizr.

lcolladotor commented 7 years ago

Ahh, I forgot to mention that I couldn't get this to work:

status <- colData(rse)$group
index <- split(seq(along = status), status)
logCounts <- log2(assays(rse)$counts + 1)
means <- sapply(index, function(ind) rowMeans(logCounts[, ind]))
rse_viz <- SummarizedExperiment(means, rowRanges = rowRanges(rse))

mean_plot <- app$plot(rse_viz, datasource_name = 'log 2 mean counts', columns = c('induced', 'uninduced'))

It's based on the epivizr vignette code.

lcolladotor commented 7 years ago

Hi,

Here is another option, using the UCSC genome browser. Note that the bigwig files in this case are not normalized to the same library size. So the visual comparison is not ideal. It's possible to do so, but that involves making new bigwig files with wiggletools or similar tools. The upside is that it's easy to change the window range. For example, here is a screenshot after I clicked on "zoom out 10x". Also, the user doesn't have to download any base-pair data from recount2.

You can see the data at https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr22%3A50434573-50664482&hgsid=586178827_e4XvaDgOXdWgRLFaaKkGl5DNYnP7.

The code is at https://gist.github.com/lcolladotor/c022472ca2fd7199e13fe61d0765096c and I used the options specified at https://ycl6.gitbooks.io/rna-seq-data-analysis/visualization.html for building the tracks file. Notably smoothingWindow=4.

If there are too many samples in a project, another option is to visualize the mean. We do have mean bigwig files (normalized to the same library size), so comparing data from different projects would be easy with either the UCSC genome browser or with epivizr.

Best, Leo

jtleek commented 7 years ago

I think we should write a tutorial that shows people how to:

(1) obtain bigwigs (2) create a granges object (3) make the base resolution plots we usually do

On Wed, Mar 29, 2017 at 9:18 AM Leonardo Collado-Torres < notifications@github.com> wrote:

Hi,

Here is another option, using the UCSC genome browser. Note that the bigwig files in this case are not normalized to the same library size. So the visual comparison is not ideal. It's possible to do so, but that involves making new bigwig files with wiggletools or similar tools. The upside is that it's easy to change the window range. For example, here is a screenshot after I clicked on "zoom out 10x". Also, the user doesn't have to download any base-pair data from recount2.

[image: screen shot 2017-03-29 at 11 10 54 am] https://cloud.githubusercontent.com/assets/2288213/24462034/f6403a4e-1470-11e7-93f2-2fb7fea44272.png

You can see the data at https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr22%3A50434573-50664482&hgsid=586178827_e4XvaDgOXdWgRLFaaKkGl5DNYnP7 .

The code is at https://gist.github.com/lcolladotor/c022472ca2fd7199e13fe61d0765096c and I used the options specified at https://ycl6.gitbooks.io/rna-seq-data-analysis/visualization.html for building the tracks file. Notably smoothingWindow=4.

If there are too many samples in a project, another option is to visualize the mean. We do have mean bigwig files (normalized to the same library size), so comparing data from different projects would be easy with either the UCSC genome browser or with epivizr.

Best, Leo

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/leekgroup/recount/issues/9#issuecomment-290123579, or mute the thread https://github.com/notifications/unsubscribe-auth/ABf7WtihrQzV2OlO2bZibE8dlbxi254rks5rqnZHgaJpZM4MloUf .

lcolladotor commented 7 years ago

http://bioconductor.org/packages/devel/bioc/html/wiggleplotr.html looks interesting (noticed via https://twitter.com/seandavis12/status/847628575339536386). From http://bioconductor.org/packages/devel/bioc/vignettes/wiggleplotr/inst/doc/wiggleplotr.html it looks like you can specify a scaling factor.

kauralasoo commented 7 years ago

Thanks @lcolladotor for linking me to this conversation. I think it would be a great application for wiggleplotr. Internally, wiggleplotr uses rtracklayer::BigWigSelection and rtracklayer::import.bw functions to import read coverage from bigWig files (see the readCoverageFromBigWig function in here: https://github.com/kauralasoo/wiggleplotr/blob/master/R/functions.R). If rtracklayer support remote bigBig files then they should also work with wiggleplotr.

Let me know if you have any questions regarding the usage of wiggleplotr and I'll be happy to help. I'm travelling until Tuesday but will have more time after that.

lcolladotor commented 7 years ago

The tutorial using derfinderPlot is part of https://github.com/LieberInstitute/recountWorkflow. I mentioned wiggleplotr and epivizR as other options. Showing how to use all the different tools could be a workflow itself, but for now that is out of the scope of the recount workflow I wrote. In any case, the workflow addresses this issue, so I'll close it for now.

leekgroup / recount

Plot base level coverage #9