tgac-vumc / ACE

Absolute Copy Number Estimation using low-coverage whole genome sequencing data
14 stars 6 forks source link

ACEcall - discrepancies between Segment_Mean and Segment_Mean2 #5

Open marromesc opened 2 years ago

marromesc commented 2 years ago

Hi,

First of all, thank you for your amazing efforts to develop the package.

I wonder if you've investigated further the discrepancies between the adjusted segment mean calculated from the QDNAseq value and the segment mean manually calculated from the adjusted copy number values of the individual bins. This difference is pretty marginal as you mentioned in your vignette for most of the segments. However, I've got a number of DCIS samples with Her2 positiveness data from immunohistochemistry. I would expect amplification for Her2+ DCIS samples in the segment where the gene is located but is called neutral by ACEcall due to the first segment mean even if Segment_Mean2 number of copies is higher than 5 - see below examples sample1 and sample3. I would trust here in the segment mean you've calculated manually. Why did you decide to go with the QDNAseq value? Could I change this on ACEcall to give it Segment_Mean2 instead?

Another matter is example sample2 which is called neutral by ACE but both calculations of a number of copies in Segment_Mean and Segment_Mean2 are higher than 5.

Chromosome | Start | End | Num_Bins | Segment_Mean | Segment_Mean2 | Segment_SE | Copies | P_log10 | calls | sample 17 | 37600001 | 40100000 | 19 | 1.84191206 | 13.4645265 | 3.89641465 | 13 | -0.043 | 0 | sample1 17 | 38600001 | 39800000 | 12 | 6.191005017 | 11.1747818 | 2.95776208 | 11 | -0.021 | 0 | sample2 17 | 39300001 | 41000000 | 17 | 0.95824526 | 18.2131214 | 5.01842878 | 18 | -0.015 | 0 | sample3

jpoell commented 1 year ago

Hi, very sorry for the late response, I just noticed I did not have notifications turned on for issues. I must say I have never seen such big differences between the two. To better figure out what is going on I would suggest to observe what happens to the bin values of all resp 19, 12, and 17 bins in these samples. To see the QDNAseq output on these bins, you can do temp <- objectsampletotemplate(object, 1); temp[temp$chr == 17 & temp$start > 37600000 & temp$end < 40100001, ] In focal amplifications you commonly observe that the segment boundaries are not very accurate, which also explains the high standard error of these segments. I am very confused though by the differences you observe, so I would first manually explore the bin values as I have suggested above.