renozao / FAQR

Frequently Asked Questions on R: my personal Ask Just Once system for my friends' R problems...
0 stars 0 forks source link

Get the feature names while using esApply #4

Open Rachelly opened 11 years ago

Rachelly commented 11 years ago

I'm using esApply to calculate the correlation between every pair of features in an E-set.

CreateCoXPRES_DB = function(eset, path) { esApply(eset,1, FUN=function(x) { res = esApply(eset,1,function(y) cor(x,y)) out = paste(y,res,sep="\t") print(x) write.table(out, paste(path,"x",sep="\"), quote=FALSE,append=TRUE) }) }

Since I want to track x and y in the function above - to know to what genes the correlation applies to, I want to get the fData. The output of the above coed is:

CreateCoXPRES_DB(mm_tmp, paste(path, "MM", sep="\")) GSM00001 GSM00002 GSM00003 GSM00004 GSM00005 GSM00006 6.901178 5.474642 5.404250 5.865132 5.786788 5.891266 Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘fData’ for signature ‘"numeric"’

So it seems in the esApply the rows are transformed to numeric vectors and we cannot approach the eset meta-data. The documentation I've looked at doesn't seem to address this issue: http://rgm3.lab.nig.ac.jp/RGM/R_rdfile?f=Biobase/man/esApply.Rd&d=R_BC

Is there a way to go around this besides writing this function as a normal apply function?

Thanks!

renozao commented 11 years ago

I don't see the call to fData in the code you posted, but:

Tell me if the following gets what you want:

function(eset, path, prefix = 'x'){

  ids <- fData(eset)$ENTREZID
  res <- cor(t(exprs(eset))
  dimnames(res) <- list(ids, ids)
  write.table(res, file = file.path(path, paste0(prefix, '.txt')), quote = FALSE)
}
Rachelly commented 11 years ago

I didn't know that cor calculates correlation for every pair of columns! I thought it computes some global correlation on the whole matrix. So the line res <- cor(t(exprs(eset)) does the whole thing, including saving the feature names.

I thought of saving the pair-wise correlation of each gene in separate files since I have ~20K genes. I once tried to deal with a matrix 20K X 20K in R and had serious memory problems that I couldn't overcome, even when using the cluster. The surprising thing is - there is no problem in creating a 20K X 20K matrix with the above code, I guess there's some optimization done there. Writing the table into a file does make problems though. So I'll just avoid it.

Thanks!

azk commented 11 years ago

Hallelujah! It's an R miracle...