waldronlab / curatedTCGAData

Curated Data From The Cancer Genome Atlas (TCGA) as MultiAssayExperiment Objects
https://bioconductor.org/packages/curatedTCGAData
41 stars 7 forks source link

Methylation data of type 'character'? #20

Closed pdmoerland closed 5 years ago

pdmoerland commented 5 years ago

It seems that the methylation data available in MultiAssayExperiment objects from the curatedTCGAData package is of type character. I first noticed this with DLBC:

dlbc <- curatedTCGAData("DLBC", assays = c("RNASeq2GeneNorm", "Methylation"), FALSE)
head(assays(dlbc)[["DLBC_Methylation-20160128"]])[,1:2]
CpG TCGA-FA-8693-01A-11D-2399-05 TCGA-FA-A4BB-01A-11D-A31Y-05
cg00000029 "0.856505140855593" "0.334665025550734"
cg00000108 NA NA
cg00000109 NA NA
cg00000165 "0.666199142688295" "0.280359782546829"
cg00000236 "0.929439802825497" "0.886613067207175"
cg00000289 "0.683339480636632" "0.619282785153004"

But (as to be expected) this also seems to be the case for the other tumor types. I suppose that this is caused by the NAs that are present in the data? Is there an easy fix?

For the moment I work around this by converting from character to numeric in a rather inelegant way:

obj <- assays(dlbc)[["DLBC_Methylation-20160128"]]
class(obj) <- "numeric"
dlbc <- MultiAssayExperiment(experiments=list("DLBC_Methylation-20160128" = obj, 
                                      "DLBC_RNASeq2GeneNorm-20160128" = assays(dlbc)[[2]]),
                     colData=colData(dlbc),
                     sampleMap=sampleMap(dlbc),
                     metadata=metadata(dlbc))
head(assays(dlbc)[["DLBC_Methylation-20160128"]])[,1:2]
CpG TCGA-FA-8693-01A-11D-2399-05 TCGA-FA-A4BB-01A-11D-A31Y-05
cg00000029 0.8565051 0.3346650
cg00000108 NA NA
cg00000109 NA NA
cg00000165 0.6661991 0.2803598
cg00000236 0.9294398 0.8866131
cg00000289 0.6833395 0.6192828
LiNk-NY commented 5 years ago

Hi Perry, @pdmoerland Yes, we are currently working on this. We are going to use the DelayedArray package for numeric Methylation matrices this big. Regards, Marcel

pdmoerland commented 5 years ago

Hi Marcel, Cool! Sounds like a very good idea, looking forward to seeing this feature being implemented. Best, Perry

LiNk-NY commented 5 years ago

Hi Perry, @pdmoerland

The latest iteration of curatedTCGAData has resolved this issue. Give it about 24 hours for the Bioconductor release builds to propagate the changes. If you would not like to wait, please use the Bioconductor devel version of curatedTCGAData.

Thank you for your patience.

Best, Marcel

pdmoerland commented 4 years ago

Hi Marcel, Thanks a lot, works like a charm! Best, Perry