mskcc / facets

Algorithm to implement Fraction and Copy number Estimate from Tumor/normal Sequencing.
140 stars 67 forks source link

smoothing of logR and logOR before segmentation #102

Open mheskett opened 6 years ago

mheskett commented 6 years ago

Is it possible to smooth these signals before segmenting? I am only interested in large high confidence segments and despite optimizing parameters in FACETS, due to inherent noise in exome sequence still see spurious segment calls

veseshan commented 6 years ago

Can you provide a sample figure so that I can understand what kind of spurious segments you are talking about?

mheskett commented 5 years ago

image many small spurious segments--whereas I just care about the obvious large regions here.

it would be easy just to draw a line visually on these samples--so I hope facets can recognize the LOH here. I used cval 320, min snp 50, and i filtered q30,Q30 with snp-count

image

veseshan commented 5 years ago

I am not sure that any of the segments looks spurious. The segments seem to be supported by a lot of loci. By the way is this WGS? I haven't seen so log-odds-ratio plot with so many points. It is odd that there are so many copy neutral LOH and no other copy number change. There are 3 segments that have total copy number greater than 2 (in chr 2, 14 and 16) but those don't look narrow.

There is an issue of the tumor being too pure i.e. cellular fraction close to 100% that's causing the mafR estimate to be off. I see a pronounced effect in chromosome 10 among other. The first segment is an LOH but the estimated mafR is a lot closer to zero than the data. This could be due to a few intermediate points on the boundary.

If you don't care about narrow segments, you can set a minimum width threshold to filter them post-hoc. Since focal amplifications are narrow we don't want to smooth them out. Also smoothing typically smooths singleton outliers not narrow segments.