lima1 / PureCN

Copy number calling and variant classification using targeted short read sequencing
https://bioconductor.org/packages/devel/bioc/html/PureCN.html
Artistic License 2.0
127 stars 32 forks source link

BAF segmentation #44

Closed berguner closed 5 years ago

berguner commented 5 years ago

Hello,

I am observing erroneous segmentation on the BAF plots. Some of the segments also don't seem to align well with the average BAF values. Is BAF segmentation performed independently from the log2 coverage ratios?

Please see the attached image as an example case. In this image you can also see 2 cnLOH events were missed by the segmentation.

Best, Bekir purecn_baf_segmentation

lima1 commented 5 years ago

Yes, this is a terrible fit due to an extremely noisy segmentation. Are the QC metrics for this sample otherwise ok? Your pool of normals contains only normals using the exactly same capture kit?

lima1 commented 5 years ago

You can try the PSCBS segmentation, but a noisy coverage profile like this is usually hard to correct.

berguner commented 5 years ago

Yes, my pool of normal contains only normal samples prepared with the same protocol. Only major difference is that normal samples are fresh blood but tumor samples are FFPE. Indeed these samples do not have the best quality because they are FFPE samples but we don't have any other options.

Do you think that a less segmented profile would produce better fitting results? What would be the advantage of using PSCBS?

lima1 commented 5 years ago

Can you post the log-file as well? I might be able to point you to improvements in the setup.

The default segmentation uses only coverage, but then uses BAF to improve the segmentation. PSCBS uses both BAF + coverage. The problem is that only 10-15% of probes contain heterozygous SNPs, so using both together isn't necessarily better. The default usually works well for a wide range of samples. But worth a try.

There are a few options in DNAcopy to force a more aggressive segmentation. The default is careful not to miss segments in low purity samples.

But either way, in old FFPE, I routinely toss probably 10% of samples after manual curation (that look like that). If you fail the sample in the Sampleid.csv file, amplifications are still called based on a high tumor vs normal log-ratio, so amplifications you should still get.

berguner commented 5 years ago

Yes, it was mentioned in the .csv file that the segmentation was noisy. Unfortunately we have the same problem in half of our samples.

I was expecting to have enough number of heterozygous SNPs because these are whole exome samples. I guess that's not the case then.

Please check the attached log file for this sample, I would appreciate any suggestions.

N08_0007_S24887.log

lima1 commented 5 years ago

Looks in general good, so it's unlikely an issue with the setup.

You get a warning that small targets were removed, so you probably use a BED file with the targeted exons. Do you have access to the position of the actual baits? The advantage is that these intervals are more uniform in width. But this won't make a huge difference. You can try making the segmentation parameters more aggressive (see PureCN.R --help), especially in the clearly higher purity samples like this one.

You can also try to increase the interval padding to 75 or 100bp. This might give you a few more SNPs for --funsegmentation PSCBS. Make sure that the mapping bias database includes the padding as well.

If you ran Picard hsmetrics on those samples, a PCA of the QC metrics might give you an explanation why samples are noisy.

berguner commented 5 years ago

I used PSCBS for segmentation and set --alpha 0.001 which improved the results quite a lot.

Thanks for helping.

lima1 commented 5 years ago

I improved the PSCBS segmentation by supporting interval weights. If you provide the interval_weights.txt file generated by NormalDB.R, intervals with high variance in the pool of normals are down-weighted. To install:

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("lima1/PSCBS", ref="add_dnacopy_weighting")
BiocManager::install("lima1/PureCN")

In case you try it out, I would be curious to get some feedback. Thanks!