leekgroup / recount

R package for the recount2 project. Documentation website: http://leekgroup.github.io/recount/
https://jhubiostatistics.shinyapps.io/recount/
40 stars 9 forks source link

Transcript level matrix for thousands of experiments #7

Closed lcolladotor closed 7 years ago

lcolladotor commented 7 years ago

See original at https://github.com/leekgroup/recount/pull/4.


@amadeusX posted this message:

Hi, Congratulations to the wonderful recount package and the huge dataset you compiled! We would like to use the normalized (or, with a lot more effort, we can normalize) gene expression compendium. Say, the rows are the genes and columns are experiments . Hence the (i,j) element of the matrix is the transcript level of gene i in experiment j. We would need that for the identification of generally co-expressed pairs of genes, and for the negative set, independently expressed gene pairs.

Thank you so much and Happy Holidays, Steve Istvan Ladunga, University of Nebraska-Lincoln

lcolladotor commented 7 years ago

Hi Steve,

If you use download_study('all', type = 'rse-gene') or download_study('all', type = 'rse-exon') you can get the matrices for all of the SRA projects. With a bit of work, you can append the GTEx and TCGA data to get the matrices for over 70k samples. Then use those matrices along with scale_counts() for the project you describe.

At the exon-exon junction level you would have to download the files for each study, decide how to filter and then merge them (otherwise it gets very large very fast).

Best, Leonardo