mksamur / RTCGAToolbox

17 stars 13 forks source link

getSurvival() may results in a strange plot because sample types not filterd. #27

Closed Tyelcie closed 4 years ago

Tyelcie commented 6 years ago

Hi Samur,

I'm confused when comparing the result of getSurvival() with my own method.

I extraced the Primary Tumor samples from mRNAArray, and merged it to the clinical data, thus to have a subset containing clinical vitalstatus and followup times with mRNAArray expression from only primary tumor samples. Then I follow the regular method to draw a KM plot using survival package. The result is diffefferent from that of getSurvival().

I look at the code in getSurvival(), and I think the difference is most likely came from the samples selected.

sampleIDs1 <- paste(samplesDat[, 1], samplesDat[, 2], samplesDat[, 3], samplesDat[, 4], sep = "-") sampleIDs11 <- paste(samplesDat[, 1], samplesDat[, 2], samplesDat[, 3], sep = "-") colnames(tmpMat1) <- sampleIDs1 tmpMat1 <- tmpMat1[, !duplicated(sampleIDs11)] colnames(tmpMat1) <- sampleIDs11[!duplicated(sampleIDs11)]

This script removes the duplicated participants, but the sample types can be either tumor or nomal.

Let me use the BRCA dataset as an example, this code will return sampleIDs1 when duplicated participants removed:

sampleIDs1[!duplicated(sampleIDs11)]

It dose return different sample types. I think it's because duplicated() will tag the later occurred elements as duplicated (TRUE), while the Tumor samples are not ordered first in mRNAArray matrix.

So, wouldn't the participants be missingly groupped when using different sample types to calculate median(or quartiles) gene expression?

Looking for your reply! Thanks in advance!

LiNk-NY commented 4 years ago

Hi @Tyelcie Unfortunately, this function is being deprecated for this release cycle. If you or someone you know would like to maintain the functions listed in the NEWS.md file. Please create another package that depends on this one. Best, Marcel