mskcc / facets

Algorithm to implement Fraction and Copy number Estimate from Tumor/normal Sequencing.
144 stars 67 forks source link

Weird regions of homozygous deletion #152

Open af8 opened 4 years ago

af8 commented 4 years ago

I would like to understand a behaviour that seems to occur quite often with facets.

Here I show an example taken from a WES paired samples analysis.

rcmat = readSnpMatrixDT(snpfile)
# Pre-processing
xx    = preProcSample(rcmat, cval = 10, ndepth = 40, het.thresh=0.15, gbuild ='hg38', snp.nbhd = 250, deltaCN = 0, hetscale = FALSE)
# First round (high cval) to infer diploid level
oo1   = procSample(xx, cval = 300, min.nhet = 10, dipLogR = NULL)
fit1  = emcncf(oo1, trace=TRUE, unif=FALSE, min.nhet=10, maxiter=20, eps=1e-3)
# Second round (less stringent cval) and forcing diploid level from previous round for final model
oo2   = procSample(xx, cval = 150, min.nhet = 10, dipLogR = oo1$dipLogR)
fit2  = emcncf(oo2, trace=TRUE, unif=FALSE, min.nhet=10, maxiter=20, eps=1e-3)
# Plotting
purity=round(100*fit2$purity,3)
ploidy=round(fit2$ploidy,2)
text_title=paste("Fit2 -  Purity=", purity, "%; Ploidy=", ploidy, sep="")
plotSample(x=oo2, emfit=fit2, sname=text_title)

I have attached the final profile and here is an excerpt of fit2$cncf object (chr1 to chr8):

   chrom seg num.mark nhet cnlr.median          mafR segclust cnlr.median.clust    mafR.clust     start       end     cf.em tcn.em lcn.em
1      1   1     1274   98  0.10467567  0.0092296398       26        0.10006393  1.411796e-02     69610  10796635 0.3694622      3      1
2      1   2     7921  698 -0.15563691  0.0055080560       18       -0.13703726  2.790738e-03  10946611 159841149 1.0000000      2      1
3      1   3        5    2  0.35704606 -0.0462150825       33        0.38770348            NA 159854415 159858823 0.1277823      6     NA
4      1   4     3705  397 -0.19240227 -0.0015385511       15       -0.17879937  1.210780e-02 159872589 234373840 1.0000000      2      1
5      1   5      207    9  0.04765994  0.0010413871       25        0.03524168            NA 234393700 239678259        NA      2     NA
6      1   6      365   59 -0.69593747  1.8134798995        5       -0.69713223  1.813480e+00 239719265 248917704 0.7462907      1      0
7      2   7     8648  758 -0.12849425  0.0011245450       18       -0.13703726  2.790738e-03     41602 241253466 1.0000000      2      1
8      3   8     3939  345 -0.51095429  0.2017021050        8       -0.48732228  2.053418e-01    197118  85979245 0.4428092      1      0
9      3   9     3157  314 -0.12614004 -0.0014364067       18       -0.13703726  2.790738e-03  86937990 195774291 1.0000000      2      1
10     3  10       27   11 -0.71312736  0.0269026620        6       -0.69713223  2.690266e-02 195779039 195790387 0.3156750      0      0
11     3  11      172   17  0.04626998 -0.0248606166       24        0.03524168  4.804142e-05 195790682 198120832 0.1245663      4      2
12     4  12      548   55 -0.13386044  0.0089008243       18       -0.13703726  2.790738e-03     86054   3768314 1.0000000      2      1
13     4  13     3154  302 -0.34303241  0.0107392386       11       -0.34303241  1.073924e-02   3889882 167011252 0.1321878      0      0
14     4  14      380   34 -0.17127855 -0.0115668242       15       -0.17879937  1.210780e-02 167234027 189984136 1.0000000      2      1
15     5  15     3079  249 -0.10319288  0.0132801021       20       -0.09278793  2.126449e-03    140601 140705696 1.0000000      2      1
16     5  16       83   20  0.92763720 -0.0135349006       42        0.91157160 -1.353490e-02 140786241 140882583 0.5337619      6      3
17     5  17      264   49  0.59982938  0.0041763946       36        0.63009416  4.176395e-03 140883565 141491858 0.5430281      4      2
18     5  18     1998  175 -0.15175424 -0.0075482104       18       -0.13703726  2.790738e-03 141494781 181263936 1.0000000      2      1
19     6  19     6417  665 -0.08913815 -0.0072917310       20       -0.09278793  2.126449e-03    203548 170582241 1.0000000      2      1
20     7  20      734   77  0.49076343  0.1193743586       34        0.50596861  1.193744e-01    193673   6880279 0.2801692      6      2
21     7  21     5307  496  0.15304819  0.0700708916       27        0.16950833  8.648980e-02   6923825 149850370 0.2339899      4      1
22     7  22      523   60  0.38827018  0.0106956280       32        0.38770348  1.069563e-02 149859313 159232440 0.4310592      4      2
23     8  23      145   23  0.10989925  0.0117815646       26        0.10006393  1.411796e-02    213461   3219289 0.3694622      3      1
24     8  24     3482  351 -0.18225403  0.0018421828       15       -0.17879937  1.210780e-02   3230165 143158897 1.0000000      2      1
25     8  25      692   32  0.18356769 -0.0318314862       28        0.16950833  2.936435e-04 143213308 145056198 0.2427879      4      2

homodel_chr_case

The issue here is with chr4 status of homozygous deletion.

chr3 and chr8 display segments of heterozygous deletion (tcn, lcn)=(1,0) with different values of logR and logOR. If I understand this well, the difference is explained by the fact that the cellular fraction of both events is different, chr8 het del being more clonal than chr3 het del.

I guess that the interpretation here for chr4 is based on the same logic, meaning that chr4 homo del only happens in a very low proportion of tumour cells. But it seems to me that this region might also be merged to the normal (2,1) status even though there exists a non neglectable deviation from mean diploid logR level. What do you think ?

Extreme cases of this behaviour is when you have very noisy (wavy logR) data. In these cases you will observe over-segmentation of the data and applying a too low cval will end up in many (erroneous) short segments of homozygous deletion.

Considering all this, I wonder if you can elaborate a bit on the role of deltaCN parameter in preprocessing step and on how to use it. Using deltaCN=1 has given me quite good (pretty) results but I am not exactly sure about what I am doing here.

Thanks.

# packages versions (install from bioconda)
data.table_1.12.2 facets_0.5.14     pctGCdata_0.2.0