peterawe / CMScaller

GNU General Public License v3.0
26 stars 17 forks source link

Gene identifiers #3

Closed jettmcp closed 5 years ago

jettmcp commented 5 years ago

Hi,

I am getting the error below and can't figure out why. It does seem that I have entrez style Ids in the counts file. I wasn't sure how to check what that the template had in it. Also, in the vignette, there is mention of setting a parameter to use other gene identifiers but it was unclear to me how to do that.

Thanks

John

res <- CMScaller(counts, RNAseq=TRUE, FDR=0.05) performing log2-transform and quantile normalization... 17500/17509 rownames(emat) failed to match to human gene identifiers cosine correlation distance 530/530 templates features not in emat, discarded <2 matched features/class Error: check templates$probe is matchable against rownames(emat) In addition: Warning message: verify that rownames(emat) are entrez head(counts[,1:2]) TCGA.3L.AA1B.01A.11R.A37K.07 TCGA.4N.A93T.01A.11R.A37K.07 100133144 3937 6291 100134869 3492 5980 10357 3261 3061 10431 1380 2518 155060 6576 235 26823 4188 4

peterawe commented 5 years ago

Hi. Could you first check whether the included example data works as expected by running

library(CMScaller)
example("CMScaller")

You should see two plots and get (the head of ) a data.frame for the resulting predictions. If that works, I would guess there's something off in the input data. Make sure that the data include coding genes and it's indeed Entrez ids.

The included templates.CMS data.frame holds the gene symbols and Entrez ids used for making the predictions. You could check how many of the relevant genes are included in your count data by running

## assuming you have loaded your #counts# data matrix
 table(rownames(counts) %in% templates.CMS$probe, useNA="always")

Please let me know if you're still not able to sort it out!

jettmcp commented 5 years ago

Thanks for the quick response. The example works fine and produces nice results. table(rownames(counts) %in% templates.CMS$probe, useNA="always") gave me a handle on the problem and I was able to sort it out. Had made an error in pulling the entrez Ids out of the file (TCGA ran-seq data). I was able in the end to run the TCGA data - with some massaging. Thanks again. John

peterawe commented 5 years ago

Great! Best of luck with your research. Peter