mskcc / facets

Algorithm to implement Fraction and Copy number Estimate from Tumor/normal Sequencing.
144 stars 67 forks source link

Unlikely genome wide aUPD #96

Open felixfrenkel opened 6 years ago

felixfrenkel commented 6 years ago

Dear, Venkatraman!

Can't figure out why FACETs infers genome wide 2:0 state for one of my samples (that is highly unlikely to be the case) and finds 2:1 in another very similar (by logOR baseline) sample.

2:0 image

2:1 image

Thank you!

veseshan commented 6 years ago

The UPD call is because the allelic imbalance (log-odds-ratio part) in the first figure is too high for balanced segments. Even though the second figure also has some allelic imbalance it doesn't seem to reach the threshold of being called UPD. The question is why is the allelic imbalance so high. Is the coverage depth too low?

felixfrenkel commented 6 years ago

Median coverage for T/N the first sample is 100/68. Can I change the allelic imbalance threshold to make the first sample 2:1? What are the possible reasons for such imbalance in raw data if we assume a sample is balanced?

veseshan commented 6 years ago

Having stared at the plots even more you can see the log-ratio in the first plot is a lot more skewed (many points below -1.5) than in the second plot. It could be due to insert size being very different between tumor and normal which may have an impact on logOR as well.

Another possibility is that normal is contaminated. This can induce an imbalance in the ref-vs-alt ratio in the normal which in turn affects logOR. You can plot the vafN data to see if that maybe the case.

felixfrenkel commented 6 years ago

Thank you, Venkatraman! Looks like different T/N insert size is not the case - they have very similar distribution. We see not signs of contamination as well. There appears to be a cofactor for such logOR diturbance - GC content. For the samples with GC ~ 45% (like those two posted above) we see high imbalance and samples with GC ~ 55% have much better logOR. Does FACETs make GC correction for logOR metric?

veseshan commented 6 years ago

Conceptually there is no reason to expect a relationship between GC content and logOR since it measures allelic imbalance between tumor and normal. There should be barely any difference in the GC content of DNA fragments with ref and alt alleles and this association implies that GC differentially affects them. I believe such and effect is an artifact.