lima1 / PureCN

Copy number calling and variant classification using targeted short read sequencing
https://bioconductor.org/packages/devel/bioc/html/PureCN.html
Artistic License 2.0
127 stars 32 forks source link

Reading Filtering when doing Mutect2Reading Filtering when doing Mutect2 #170

Closed Pitithat-pu closed 3 years ago

Pitithat-pu commented 3 years ago

Hi, I tried PureCN for calling CNV and tumor purity estimation for WES of cfDNA. I have cfDNA from patients with buffycoat control but they use different capture kit version. So I selected ~20 cfDNA from those samples to create normalDB.

I run Mutect2 tumor-only mode with extra read-filtering option like following for getting vcf for both creating Normal DB and running individual cfDNA. gatk Mutect2 -R hs37d5_PhiX.fa -I ${bamfile} -max-mnp-distance 0 --min-base-quality-score 20 --read-filter MappingQualityReadFilter --read-filter OverclippedReadFilter --minimum-mapping-quality 30 --read-filter FragmentLengthReadFilter --min-fragment-length 30 -O ${output_dir}/${filename}.vcf.gz"

I would like to ask whether these params (e.g. minimum-mapping-quality 30) would fit or be necessary for downstream PureCN process e.g. mapping bias estimation?

Possibly another question. Would it be better to run Mutect2 given that buffycoat control rather than tumor-only mode?/ Thanks Pitithat

lima1 commented 3 years ago

Hi Pitithat,

if you install the GenomicsDB-R package (see Best Practices vignette), you can use the same pool of normal database for PureCN as you can use for Mutect2. In general, if your parameters work fine with Mutect2, it should be fine with PureCN too. I would suggest following GATK's best practices as closely as possible: https://gatk.broadinstitute.org/hc/en-us/articles/360035531132

It's not optimal to use normals from a different capture kit, but usually better than nothing if everything else is similar.

Please note that normal controls are used for two different purposes: One is the Mutect2 pool of normals for variant filtering and mapping bias removal. The other one is for normalizing coverage. Here it is crucial that the normals are from the same capture kit, unfortunately no way around it.

Run Mutect2 with the 20 buffy coat controls as pool of normals (as described in the best practices under "A step-by-step guide to the new Mutect2 Panel of Normals Workflow").

Best, Markus

Pitithat-pu commented 3 years ago

Hi Markus,

Thank you for your reply. I followed the Mutect2 "A step-by-step guide to the new Mutect2 Panel of Normals Workflow", except using the GenomicsDB-R package. Instead, I used gatk CombineVariants (--minimumN 3) for creating a PoN vcf following the PureCN bioc vignette . I hope this still work.

Best Pitithat

lima1 commented 3 years ago

Yep, that still works. GenomicsDB-R would be equivalent to --minimumN 1, with a minor bug fixed: https://github.com/lima1/PureCN/issues/52 . Worst case of the bug is that a small number of artifacts make it through the filter.