waldronlab / curatedTCGAData

Curated Data From The Cancer Genome Atlas (TCGA) as MultiAssayExperiment Objects
https://bioconductor.org/packages/curatedTCGAData
41 stars 7 forks source link

Simplified methylation data #34

Open pcheng84 opened 4 years ago

pcheng84 commented 4 years ago

Hi Levi and Marcel,

I was wondering if it would be possible to implement a simplified methylation data set. As you know the raw methylation data is quite unwieldy so I condensed it to a matrix, where each row is the median B value for the CpG island. I took the annotation from the IlluminaHumanMethylation450kanno.ilmn12.hg19 library.

here is the code I used to make the simplified methylation values.

library(IlluminaHumanMethylation450kanno.ilmn12.hg19)
library(curatedTCGAData)
library(data.table)
lusc <- curatedTCGAData("LUSC", "Methylation_methyl450", FALSE)

#Get Illumina methylation island data
illumina <- getAnnotation(IlluminaHumanMethylation450kanno.ilmn12.hg19)
illumina <- data.table("REF" = rownames(illumina),
                     "Chr" = illumina@listData$chr,
                     "Loc" = illumina@listData$pos,
                     "Island" = illumina@listData$Relation_to_Island,
                     "Island_Name" = illumina@listData$Islands_Name,
                     "Gene" = illumina@listData$UCSC_RefGene_Name,
                     "UCSC_RefGene_Group" = illumina@listData$UCSC_RefGene_Group
  )
setkey(illumina, "Island")
illumina <- illumina["Island"]

#Extract methylation data for all genes, then
#limit to loci with Island annotation from Illumina
meth_assay_num <- grep("Methylation", names(mae))
meth <- mae[[meth_assay_num]]
meth <- meth[illumina$REF,]

#Merge methylation data with annotation data
merged <- as.data.table(assay(meth))
merged[, "REF" := illumina$REF]
merged <- merge(merged, illumina, by = "REF")

#Calculate the median methylation value per island
methylisland <- merged[, lapply(.SD, function(x) median(x, na.rm=T)),
                         .SDcols = !c("REF", "Chr", "Loc", "Island", "Gene", "UCSC_RefGene_Group"), 
                         by = Island_Name]
methylisland <- na.omit(methylisland)

methylisland2 <- as.matrix(methylisland[, setdiff(colnames(methylisland), "Island_Name"), with = FALSE])
rownames(methylisland2) <- methylisland$Island_Name
#Append expression level matrix to original MAE object
mae2 <- c(mae, SimpleMethyl = methylisland2, mapFrom = meth_assay_num)

Cheers, Phil

lwaldron commented 4 years ago

Thanks for the code, @pcheng84! Yes it would be convenient to have a simplified methylation version pre-computed. @LiNk-NY, it looks like the c("REF", "Chr", "Loc", "Island", "Gene", "UCSC_RefGene_Group") (and possibly other?) columns would be convenient to have as rowData and rowRanges in a RangedSummarizedExperiment.

pcheng84 commented 4 years ago

Those are the columns I use for the annotation of the islands. I think there are more columns in the listData tables of the IlluminaHumanMethylation450kanno.ilmn12.hg19 annotation object you could add to rowData.

I made a small mistake in my code

lusc <- curatedTCGAData("LUSC", "Methylation_methyl450", FALSE)

should be

mae <- curatedTCGAData("LUSC", "Methylation_methyl450", FALSE)
LiNk-NY commented 4 years ago

Hi Phil, @pcheng84 Thank you for pointing this out and providing code to work with. I think this would be a good addition as an add-on type of package that can optionally replace methylation datasets from curatedTCGAData. Currently, it would take me quite some time to re-run the pipeline and integrate the datasets. It would be easier to have a separate ExperimentHub package. Let me know what you think. Thanks!

Best, Marcel

pcheng84 commented 4 years ago

Hi Marcel, @LiNk-NY

An ExperimentHub package would work nicely. How should we proceed with this?

Cheers, Phil

LiNk-NY commented 4 years ago

Hi Phil, @pcheng84 I was thinking we could test-drive functionality that Kayla @kayla-morrell has been working on to make package creation and resource upload easier.

There is a branch called constructHubFunctions in AnnotationHubData that lets you run hub_create_package(). I will look into it as well in the coming days.

https://github.com/Bioconductor/AnnotationHubData/tree/constructHubFunctions

Best, Marcel