hansenlab / minfi

Devel repository for minfi
58 stars 68 forks source link

replacement using pData() broken #108

Open kasperdanielhansen opened 7 years ago

kasperdanielhansen commented 7 years ago

From Tajuddin, Salman salman.tajuddin@nih.gov

I was going through your 450K data analysis with minif, located here http://kasperdanielhansen.github.io/genbioconductor/html/minfi.html and encountered an error when merging the GEO phenotype data with the methylation data. Not sure what went wrong here.

The error message reads like this:

> pData(rgSet) <- pD
Error in (function (classes, fdef, mtable)  :
  unable to find an inherited method for function ‘pData<-’ for signature ‘"RGChannelSet", "data.frame"’

sessionInfo() is below for your reference

Thanks for your help.

Salman

image

apastore commented 7 years ago

for posterity. it need a DataFrame not data.frame

chrislun16 commented 7 years ago

What does that mean apastore? What do I need to change from data.frame to a DataFrame? And if you can, what's the difference between the two?

Edit: I figured it out. You have to do pD<-as(pD,"DataFrame")

kasperdanielhansen commented 7 years ago

A data.frame is a classic base R object. A DataFrame is a Bioconductor extension. The main difference is that columns of a DataFrame can be only S4 object with length. A main use case is IRanges and/or GRanges.

On Sat, Jul 8, 2017 at 12:53 AM, chrislun16 notifications@github.com wrote:

What does that mean apastore? What do I need to change from data.frame to a DataFrame? And if you can, what's the difference between the two?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kasperdanielhansen/minfi/issues/108#issuecomment-313811262, or mute the thread https://github.com/notifications/unsubscribe-auth/AEuhn1gGfHXWqsOeL9-NCPeAhHvD-OTbks5sLrbkgaJpZM4NLhjq .

chrislun16 commented 7 years ago

Do you know if the guide is still correct, because I seem to be getting incorrect beta values.

I was following the guide up until the preprocessing part. Meaning that I was able to obtain the RGset. Then I used my own preprocessing code to obtain the beta values. However, I cross checked them with the beta values that were on GEO and they weren't consistent.

For example, the accession number I used was GSE68777. So I went to that study on GEO and clicked on the first sample: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1681154. Then I scrolled down and clicked "Download full table" to download the samples and beta values in a text file.

Then I typed head(beta) and chose the first sample. Then I did command F in the text file for that sample and it's value there wasn't the same as the value from the beta data table. Hopefully you can help find the error.

Here is the code I'm using:

library(GEOquery)
library(minfi)
library("IlluminaHumanMethylation450kanno.ilmn12.hg19")
library("IlluminaHumanMethylation450kmanifest")

######## Code copied from 450k Guide ########

getGEOSuppFiles("GSE68777")
untar("GSE68777/GSE68777_RAW.tar", exdir = "GSE68777/idat")
head(list.files("GSE68777/idat", pattern = "idat"))
idatFiles <- list.files("GSE68777/idat", pattern = "idat.gz$", full = TRUE)
sapply(idatFiles, gunzip, overwrite = TRUE)
rgSet <- read.metharray.exp("GSE68777/idat")
geoMat <- getGEO("GSE68777")
pD.all <- pData(geoMat[[1]])
pD <- pD.all[, c("title", "geo_accession", "characteristics_ch1.1", "characteristics_ch1.2")]
names(pD)[c(3,4)] <- c("group", "sex")
pD$group <- sub("^diagnosis: ", "", pD$group)
pD$sex <- sub("^Sex: ", "", pD$sex)
sampleNames(rgSet) <- sub(".*_5", "5", sampleNames(rgSet))
rownames(pD) <- pD$title
pD <- pD[sampleNames(rgSet),]
pData(rgSet) <- pD
wvictor14 commented 7 years ago

It's not clear to me if those GEO sample submissions are processed or unprocessed beta values. Note that you did a normalization step:

MSet <- preprocessIllumina(rgSet, bg.correct = TRUE, normalize = "controls")

So your data is normalized. Why don't you compare the raw beta values to the GEO sample submissions? If those are the same, then good, those submissions are raw beta values. If they are not the same, then maybe they are processed betas and you'll need to find out their normalization step (details are missing in the article) in order to exactly reproduce their data.

chrislun16 commented 7 years ago

You are right. When I compared the raw beta values, they were very similar. Thanks for the help.