TCGA CNV data comes as CNV segments. Each segment has a value, called segment mean. These values are continuous values. However biologist often refer copy number as gain and loss, discrete states. I am interested in the distribution of segment means for TCGA pan-cancer cohort, as well as individual TCGA cohort. The distribution will help me to determine the cutoff for determine copy number gain or loss.
In a diploid genome, a single-copy gain in a perfectly pure, homogeneous sample has a copy ratio of 3/2. In log2 scale, this is log2(3/2) = 0.585, and a single-copy loss is log2(1/2) = -1.0.” However, most tumors are heterogeneous (clonal tumor populations) and have some normal stroma. Therefore, the sample’s purity need to be considered so alterations are not missed.
generate copy number segment mean distribution without adjust for purity
generate copy number adjusted segment mean (adjusted for purity) distribution .
TCGA CNV data comes as CNV segments. Each segment has a value, called segment mean. These values are continuous values. However biologist often refer copy number as gain and loss, discrete states. I am interested in the distribution of segment means for TCGA pan-cancer cohort, as well as individual TCGA cohort. The distribution will help me to determine the cutoff for determine copy number gain or loss.
In a diploid genome, a single-copy gain in a perfectly pure, homogeneous sample has a copy ratio of 3/2. In log2 scale, this is log2(3/2) = 0.585, and a single-copy loss is log2(1/2) = -1.0.” However, most tumors are heterogeneous (clonal tumor populations) and have some normal stroma. Therefore, the sample’s purity need to be considered so alterations are not missed.