Closed lcolladotor closed 7 years ago
Hi Steve,
If you use download_study('all', type = 'rse-gene')
or download_study('all', type = 'rse-exon')
you can get the matrices for all of the SRA projects. With a bit of work, you can append the GTEx and TCGA data to get the matrices for over 70k samples. Then use those matrices along with scale_counts()
for the project you describe.
At the exon-exon junction level you would have to download the files for each study, decide how to filter and then merge them (otherwise it gets very large very fast).
Best, Leonardo
See original at https://github.com/leekgroup/recount/pull/4.
@amadeusX posted this message:
Hi, Congratulations to the wonderful recount package and the huge dataset you compiled! We would like to use the normalized (or, with a lot more effort, we can normalize) gene expression compendium. Say, the rows are the genes and columns are experiments . Hence the (i,j) element of the matrix is the transcript level of gene i in experiment j. We would need that for the identification of generally co-expressed pairs of genes, and for the negative set, independently expressed gene pairs.
Thank you so much and Happy Holidays, Steve Istvan Ladunga, University of Nebraska-Lincoln