waldronlab / curatedTCGAData

Curated Data From The Cancer Genome Atlas (TCGA) as MultiAssayExperiment Objects
https://bioconductor.org/packages/curatedTCGAData
42 stars 7 forks source link

Provide Access to Latest TCGA Data #10

Closed DarioS closed 5 years ago

DarioS commented 6 years ago

The data available is preprocessed before the mid-2016 harmonisation, which uses a newer reference genome and different analytic algorithms. Also, the Genomic Data Commons data has had some substantial changes since last year. For example, four months ago, the mutation data was reprocessed to remove Oxo-G artefacts caused by the exome-seq kit.

lwaldron commented 6 years ago

This is a good comment and a worthwhile change. Unfortunately I expect it will take a while to be able to migrate the underlying downloader from RTCGAToolbox to the GenomicDataCommons library, assuming there will be a different set of quirks and inconsistencies that come up when integrating the data. @LiNk-NY have you tried out just using Sean's GenomicDataCommons library (I mean just basic use, not as the curatedTCGAData downloader)?

LiNk-NY commented 6 years ago

I haven't really tried to download a whole set of MultiAssayExperiment datasets but I can see what the package has to offer.

LiNk-NY commented 5 years ago

This issue is outside the scope of the package as it is currently. We'd have to write an interface to GenomicDataCommons and avoid conflating the two data sources.