Closed ShvartsmanIrina closed 3 months ago
Hi @ShvartsmanIrina , PureCN is not made for low coverage data. Anyways, looks like you didn't use gnomAD in the Mutect call. I recommend following https://gatk.broadinstitute.org/hc/en-us/articles/360035531132--How-to-Call-somatic-mutations-using-GATK4-Mutect2 as closely as possible (including the PoN filtering).
Dear, @lima1, thank you very much for the answer! I solved this problem by adding flags in the vcf results of normal samples by FilterMutectCalls and, accordingly, when calling the tumor sample, I added the --germline-resource flag. Regarding coverage, I am validating the workflow on samples that are not entirely suitable, but I intend to run it on tumor only ovarian cancer samples, when the final result should be an HRD score. If we're talking about coverage - what is the minimum coverage you recommend?
Depends what you want to do. Low coverage makes the variant classification very difficult, so don't expect reliable calls there. The SNP allelic fractions will also have a high standard deviation, so it won't help the segmentation a lot. We run panels with > 1000X coverage, so I don't have a lot of experience with low coverage data. 80X is probably a good minimum for what we promise in the paper. You can also try something like ichorCNA which is designed for very low coverage.
Thank you, @lima1 !
Hello lima, I'm trying to run PureCN with NormalDB, but I get Error: INFO [2024-03-11 20:23:58] PureCN 2.8.1 INFO [2024-03-11 20:23:58] ------------------------------------------------------------ INFO [2024-03-11 20:23:58] Arguments: -tumor.coverage.file /cromwell-executions/PurecnTumor/a9de9374-76c9-44cd-a3c6-e68e5fe130a8/call-PureCN/inputs/-285599999/102_S544_coverage_loess.txt.gz -log.ratio -seg.file -vcf.file /cromwell-executions/PurecnTumor/a9de9374-76c9-44cd-a3c6-e68e5fe130a8/call-PureCN/inputs/1991514784/102_S544.vcf -genome hg38 -sex ? -args.setPriorVcf 6 -args.setMappingBiasVcf /cromwell-executions/PurecnTumor/a9de9374-76c9-44cd-a3c6-e68e5fe130a8/call-PureCN/inputs/1662463198/mapping_bias_hg38.rds -args.filterIntervals 100,0.05 -args.segmentation 0.005,NULL, -sampleid 102_S544 -min.ploidy 1.4 -max.ploidy 6 -max.non.clonal 0.2 -max.homozygous.loss 0.05,1e+07 -log.ratio.calibration 0.1 -model.homozygous FALSE -error 0.001 -interval.file /cromwell-executions/PurecnTumor/a9de9374-76c9-44cd-a3c6-e68e5fe130a8/call-PureCN/inputs/1241338070/baits_hg38_intervals.txt -min.logr.sdev 0.15 -max.segments 300 -plot.cnv TRUE -vcf.field.prefix PureCN. -cosmic.vcf.file -DB.info.flag DB -POPAF.info.field POP_AF -Cosmic.CNT.info.field Cosmic.CNT -model beta -post.optimize TRUE -BPPARAM -log.file ./102_S544.log -normal.coverage.file -normalDB -args.filterVcf -fun.segmentation -test.num.copy -test.purity -speedup.heuristics INFO [2024-03-11 20:23:58] Loading coverage files... INFO [2024-03-11 20:24:02] Mean target coverages: 6X (tumor) 6X (normal). INFO [2024-03-11 20:24:04] Mean coverages: chrX: 1.68, chrY: 0.95, chr1-22: 1.73. INFO [2024-03-11 20:24:04] Mean coverages: chrX: 2.30, chrY: 0.95, chr1-22: 1.55. INFO [2024-03-11 20:24:14] Removing 129536 intervals with missing log.ratio. INFO [2024-03-11 20:24:14] Removing 100 low/high GC targets. INFO [2024-03-11 20:24:15] Removing 43878 intervals excluded in normalDB. INFO [2024-03-11 20:24:15] Removing 9905 intervals with low total coverage in normal (< 150.00 reads). INFO [2024-03-11 20:24:15] normalDB provided. Setting minimum coverage for segmentation to 0.0015X. INFO [2024-03-11 20:24:15] Removing 17731 low count (< 100 total reads) intervals. INFO [2024-03-11 20:24:15] Removing 1383 low coverage (< 0.0015X) intervals. INFO [2024-03-11 20:24:15] Using 42217 intervals (23238 on-target, 18979 off-target). INFO [2024-03-11 20:24:15] Ratio of mean on-target vs. off-target read counts: 0.11 INFO [2024-03-11 20:24:15] Mean off-target bin size: 99902 INFO [2024-03-11 20:24:15] AT/GC dropout: 0.83 (tumor), 0.79 (normal), 1.04 (coverage log-ratio). INFO [2024-03-11 20:24:15] High GC-bias in normal and/or tumor coverage. INFO [2024-03-11 20:24:15] Loading VCF... INFO [2024-03-11 20:24:29] Found 593360 variants in VCF file. INFO [2024-03-11 20:24:31] Removing 4120 triallelic sites. INFO [2024-03-11 20:24:32] Maximum of POPAP INFO is > 1, assuming -log10 scaled values WARN [2024-03-11 20:24:32] vcf.file has no DB info field for membership in germline databases. Found and used valid population allele frequency > 0.001000 instead. INFO [2024-03-11 20:24:33] 0 (0.0%) variants annotated as likely germline (DB INFO flag). FATAL [2024-03-11 20:24:33] VCF either contains no germline variants or variants are not properly
FATAL [2024-03-11 20:24:33] annotated.
FATAL [2024-03-11 20:24:33]
FATAL [2024-03-11 20:24:33] This is most likely a user error due to invalid input data or
FATAL [2024-03-11 20:24:33] parameters (PureCN 2.8.1).
I'm using Muteck2 both for building NormalDB and tumor sample: gatk Mutect2 \ -R ~{ref_fasta} \ -I ~{bam} \ -O ~{BAM_pre}.vcf \ --genotype-germline-sites true \ --genotype-pon-sites true \ --interval-padding 50
my tumor.vcf is:
fileformat=VCFv4.2
FORMAT=
FORMAT=
FORMAT=
FORMAT=
FORMAT=
FORMAT=
FORMAT=
FORMAT=
FORMAT=
FORMAT=
FORMAT=
FORMAT=
FORMAT=
GATKCommandLine=<ID=Mutect2,CommandLine="Mutect2 --genotype-pon-sites true --genotype-germline-sites true --output 102_S544.vcf --interval-padding 50 --input /cromwell-executions/Mutect2/34dc964c-ae1e-4d3a-9d62-fd1164a46541/call-M1/inputs/-1728826228/102_S544.bam --reference /cromwell-executions/Mutect2/34dc964c-ae1e-4d3a-9d62-fd1164a46541/call-M1/inputs/1253363328/hg38.fa --f1r2-median-mq 50 --f1r2-min-bq 20 --f1r2-max-depth 200 --flow-likelihood-parallel-threads 0 --flow-likelihood-optimized-comp false --trim-to-haplotype true --exact-matching false --flow-use-t0-tag false --flow-probability-threshold 0.003 --flow-remove-non-single-base-pair-indels false --flow-remove-one-zero-probs false --flow-quantization-bins 121 --flow-fill-empty-bins-value 0.001 --flow-symmetric-indel-probs false --flow-report-insertion-or-deletion false --flow-disallow-probs-larger-than-call false --flow-lump-probs false --flow-retain-max-n-probs-base-format false --flow-probability-scaling-factor 10 --flow-order-cycle-length 4 --keep-boundary-flows false --af-of-alleles-not-in-resource -1.0 --mitochondria-mode false --mutect3-training-mode false --mutect3-ref-downsample 10 --mutect3-alt-downsample 20 --mutect3-non-artifact-ratio 20 --tumor-lod-to-emit 3.0 --initial-tumor-lod 2.0 --pcr-snv-qual 40 --pcr-indel-qual 40 --base-qual-correction-factor 5 --max-population-af 0.01 --downsampling-stride 1 --callable-depth 10 --max-suspicious-reads-per-alignment-start 0 --normal-lod 2.2 --ignore-itr-artifacts false --gvcf-lod-band -2.5 --gvcf-lod-band -2.0 --gvcf-lod-band -1.5 --gvcf-lod-band -1.0 --gvcf-lod-band -0.5 --gvcf-lod-band 0.0 --gvcf-lod-band 0.5 --gvcf-lod-band 1.0 --minimum-allele-fraction 0.0 --independent-mates false --flow-mode NONE --disable-adaptive-pruning false --kmer-size 10 --kmer-size 25 --dont-increase-kmer-sizes-for-cycles false --allow-non-unique-kmers-in-ref false --num-pruning-samples 1 --min-dangling-branch-length 4 --recover-all-dangling-branches false --max-num-haplotypes-in-population 128 --min-pruning 2 --adaptive-pruning-initial-error-rate 0.001 --pruning-lod-threshold 2.302585092994046 --pruning-seeding-lod-threshold 9.210340371976184 --max-unpruned-variants 100 --linked-de-bruijn-graph false --disable-artificial-haplotype-recovery false --enable-legacy-graph-cycle-detection false --debug-assembly false --debug-graph-transformations false --capture-assembly-failure-bam false --num-matching-bases-in-dangling-end-to-recover -1 --error-correction-log-odds -Infinity --error-correct-reads false --kmer-length-for-read-error-correction 25 --min-observations-for-kmer-to-be-solid 20 --likelihood-calculation-engine PairHMM --base-quality-score-threshold 18 --dragstr-het-hom-ratio 2 --dont-use-dragstr-pair-hmm-scores false --pair-hmm-gap-continuation-penalty 10 --expected-mismatch-rate-for-read-disqualification 0.02 --pair-hmm-implementation FASTEST_AVAILABLE --pcr-indel-model CONSERVATIVE --phred-scaled-global-read-mismapping-rate 45 --disable-symmetric-hmm-normalizing false --disable-cap-base-qualities-to-map-quality false --enable-dynamic-read-disqualification-for-genotyping false --dynamic-read-disqualification-threshold 1.0 --native-pair-hmm-threads 4 --native-pair-hmm-use-double-precision false --flow-hmm-engine-min-indel-adjust 6 --flow-hmm-engine-flat-insertion-penatly 45 --flow-hmm-engine-flat-deletion-penatly 45 --pileup-detection false --use-pdhmm false --use-pdhmm-overlap-optimization false --make-determined-haps-from-pd-code false --print-pileupcalling-status false --fallback-gga-if-pdhmm-fails true --pileup-detection-enable-indel-pileup-calling false --pileup-detection-active-region-phred-threshold 0.0 --num-artificial-haplotypes-to-add-per-allele 5 --artifical-haplotype-filtering-kmer-size 10 --pileup-detection-snp-alt-threshold 0.1 --pileup-detection-indel-alt-threshold 0.1 --pileup-detection-absolute-alt-depth 0.0 --pileup-detection-snp-adjacent-to-assembled-indel-range 5 --pileup-detection-snp-basequality-filter 12 --pileup-detection-bad-read-tolerance 0.0 --pileup-detection-proper-pair-read-badness true --pileup-detection-edit-distance-read-badness-threshold 0.08 --pileup-detection-chimeric-read-badness true --pileup-detection-template-mean-badness-threshold 0.0 --pileup-detection-template-std-badness-threshold 0.0 --pileup-detection-filter-assembly-alt-bad-read-tolerance 0.0 --pileup-detection-edit-distance-read-badness-for-assembly-filtering-threshold 0.12 --bam-writer-type CALLED_HAPLOTYPES --dont-use-soft-clipped-bases false --override-fragment-softclip-check false --min-base-quality-score 10 --smith-waterman FASTEST_AVAILABLE --emit-ref-confidence NONE --max-mnp-distance 1 --force-call-filtered-alleles false --reference-model-deletion-quality 30 --soft-clip-low-quality-ends false --allele-informative-reads-overlap-margin 2 --smith-waterman-dangling-end-match-value 25 --smith-waterman-dangling-end-mismatch-penalty -50 --smith-waterman-dangling-end-gap-open-penalty -110 --smith-waterman-dangling-end-gap-extend-penalty -6 --smith-waterman-haplotype-to-reference-match-value 200 --smith-waterman-haplotype-to-reference-mismatch-penalty -150 --smith-waterman-haplotype-to-reference-gap-open-penalty -260 --smith-waterman-haplotype-to-reference-gap-extend-penalty -11 --smith-waterman-read-to-haplotype-match-value 10 --smith-waterman-read-to-haplotype-mismatch-penalty -15 --smith-waterman-read-to-haplotype-gap-open-penalty -30 --smith-waterman-read-to-haplotype-gap-extend-penalty -5 --flow-assembly-collapse-hmer-size 0 --flow-assembly-collapse-partial-mode false --flow-filter-alleles false --flow-filter-alleles-qual-threshold 30.0 --flow-filter-alleles-sor-threshold 3.0 --flow-filter-lone-alleles false --flow-filter-alleles-debug-graphs false --min-assembly-region-size 50 --max-assembly-region-size 300 --active-probability-threshold 0.002 --max-prob-propagation-distance 50 --force-active false --assembly-region-padding 100 --padding-around-indels 75 --padding-around-snps 20 --padding-around-strs 75 --max-extension-into-assembly-region-padding-legacy 25 --max-reads-per-alignment-start 50 --enable-legacy-assembly-region-trimming false --interval-set-rule UNION --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --max-variants-per-shard 0 --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false --minimum-mapping-quality 20 --max-read-length 2147483647 --min-read-length 30 --disable-tool-default-annotations false --enable-all-annotations false",Version="4.5.0.0",Date="March 11, 2024 at 1:44:25 PM GMT">
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
MutectVersion=2.2
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
....
filtering_status=Warning: unfiltered Mutect 2 calls. Please run FilterMutectCalls to remove false positives.
source=Mutect2
tumor_sample=102_S544
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 102_S544
chr1 13273 . G C . . AS_SB_TABLE=18,17|11,10;DP=60;ECNT=1;MBQ=40,40;MFRL=195,214;MMQ=27,24;MPOS=27;POPAF=7.30;TLOD=48.28 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:35,21:0.407:56:22,14:8,7:31,21:18,17,11,10 chr1 13417 . C CGAGA . . AS_SB_TABLE=0,7|0,4;DP=14;ECNT=1;MBQ=40,40;MFRL=201,210;MMQ=22,24;MPOS=31;POPAF=7.30;RPA=2,4;RU=GA;STR;TLOD=14.22 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:7,4:0.377:11:1,0:1,0:7,4:0,7,0,4 chr1 17385 . G A . . AS_SB_TABLE=9,0|3,0;DP=12;ECNT=1;MBQ=40,40;MFRL=219,180;MMQ=34,34;MPOS=47;POPAF=7.30;TLOD=8.67 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:9,3:0.286:12:2,0:7,3:9,3:9,0,3,0 chr1 69511 . A G . . AS_SB_TABLE=0,0|24,14;DP=39;ECNT=1;MBQ=0,40;MFRL=0,195;MMQ=60,48;MPOS=20;POPAF=7.30;TLOD=140.01 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:0,38:0.972:38:0,18:0,14:0,33:0,0,24,14 chr1 121009 . C T . . AS_SB_TABLE=0,0|0,0;DP=3;ECNT=1;MBQ=40,40;MFRL=180,233;MMQ=29,60;MPOS=25;POPAF=7.30;TLOD=3.65 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:1,1:0.500:2:0,0:1,1:1,1:0,1,0,1
I would really appreciate your help!