mskcc / facets

Algorithm to implement Fraction and Copy number Estimate from Tumor/normal Sequencing.
144 stars 67 forks source link

ploidy and copy number states very different between SNParray (ASCAT) and WGS (FACETS) #113

Open ahwanpandey opened 5 years ago

ahwanpandey commented 5 years ago

Hi,

I have run FACETS on a bunch of WGS data (Tumor depth = 80x and Normal = 40x) that also ahve SNParray data, and I am seeing that there are large ploidy differences between some samples across the assays:

ASCAT vs FACETS ploidy scatter:

image

The purity estimates are generally similar:

ASCAT vs FACETS purity scatter:

image

I am picking one of the samples to compare their copy number profiles across the assays:

FACETS (WGS): ploidy = 4.27, purity = 0.71

image

ASCAT (SNParray): ploidy = 2.42, purity = 0.81

image

The setting I have used her for FACETS are CVAL [50 1000] SNP.NBHD [1000], but they are basically the same even for (CVAL [25 500] SNP.NBHD [500])

As you can see, the copy number estimates are very different for this sample as produced by ASCAT and FACETS.

Would you be able to help me understand these differences?

Thanks so much.

veseshan commented 5 years ago

The regression line in the scatterplot is distracting. The x=y line is better since you want the two estimates to be similar. In any case, for the specific example you show it seems like facets thinks there is genome doubling which just looking at the figure is not obvious. You can look at the flags object in the procSample output to see if that gives you a reason. Also if you want to share the segment summary i.e. out data frame in procSample output, it can help us make the algorithm more robust. Thanks.

ahwanpandey commented 5 years ago

Here's what the flags object contains:

[1] "mafR larger than expected if -0.0755933169799341 is diploid level"                                                              
[2] "not consistent for 1 copy loss from diploid in segclust: 17, 18, 19, 25, 29, 34, 35, 36, 38, 39, 40, 41, 42, 43, 45, 46, 47, 49"

Also, I have uploaded the "jointseg" and "out" dataframes from procSample in Dropbox as they are too large to attach here:

https://www.dropbox.com/sh/z8jsum2pqmlt0ku/AADAmI8Gaq16kj4RciM_vOr3a?dl=0

Thanks for your input.

veseshan commented 5 years ago

The facets fit you see can be explained with the following spider plots (logRlogORspider function). The algorithm notes that alleles are balanced at -0.071 however it is not choosing it as the diploid level since chromosomes 9q and 17 have lower log-ratio values than that (both around -0.45) and very different allelic imbalances (mafR of 0,27 vs 4.3). So if -0.071 is diploid both cannot be single copy losses. In the spider plot you can see all the points on the graph to the right between the 1-0 and 2-0 lines.

pandey-spider

Seems like chr17 is a mixture of 1-0 and 2-0 clones and we need a way to fit that in facets robustly. The current fit (figure on the left) calls the sample as genome doubled which moves all the points bringing them closer to lines.