lima1 / PureCN

Copy number calling and variant classification using targeted short read sequencing
https://bioconductor.org/packages/devel/bioc/html/PureCN.html
Artistic License 2.0
125 stars 32 forks source link

Mismatch between tumor purity and number of copy number alterations #349

Closed gargid98 closed 5 months ago

gargid98 commented 5 months ago

Hello, Thank you for developing this tool and your helpful responses on github! I am running PureCN on fresh frozen anal precancer samples. We have previously run Ichor on ULP-WGS samples from the same tissue type and are getting much lower tumor purity (.05 to .10) with clearer copy number alterations. Many of the samples that have a relatively higher tumor purity for the tumor we are working with (~.30) have very few copy number alterations. I'm wondering if any parameters that I'm using can be adjusted or how to interpret this change. Here are the PureCN commands I used:

Rscript $PURECN/IntervalFile.R --force --in-file /sc/arion/projects/BiNGS/bings_omics/data/bings/2023/keithsigel/analprecancer_wes/purecn/interval_list.bed --fasta /sc/arion/projects/BiNGS/bings_omics/data/bings/2023/keithsigel/analprecancer_wes/scratch/Homo_sapiens_assembly19.fasta --out-file baits_hg19_intervals_mappability.txt --off-target --mappability /sc/arion/projects/BiNGS/bings_analysis/projects/2023/keithsigel/analprecancer_wes/damleg01/wgEncodeCrgMapabilityAlign100mer.bigWig

Rscript $PURECN/Coverage.R --out-dir /sc/arion/projects/BiNGS/bings_omics/data/bings/2023/keithsigel/analprecancer_wes/purecn/normal_test --bam /sc/arion/projects/BiNGS/bings_omics/data/bings/2023/keithsigel/analprecancer_wes/raw/bam_2/38359A/v1/38359A.bam --intervals baits_hg19_intervals_mappability.tx

Rscript $PURECN/Coverage.R --out-dir /sc/arion/projects/BiNGS/bings_omics/data/bings/2023/keithsigel/analprecancer_wes/purecn/tumor_test --bam /sc/arion/projects/BiNGS/bings_omics/data/bings/2023/keithsigel/analprecancer_wes/raw/bam_2/37258A/v1/37258A.bam --intervals baits_hg19_intervals_mappability.txt"

Rscript $PURECN/PureCN.R --out /sc/arion/projects/BiNGS/bings_omics/data/bings/2023/keithsigel/analprecancer_wes/purecn/purecn_command_output_0121/ --tumor /sc/arion/projects/BiNGS/bings_omics/data/bings/2023/keithsigel/analprecancer_wes/purecn/tumor_test/37258A_coverage_loess.txt.gz --sampleid 37258A --vcf /sc/arion/projects/BiNGS/bings_omics/data/bings/2023/keithsigel/analprecancer_wes/mutect/FilterMutectCalls/37258A-filtered-gatk43.vcf --normaldb /sc/arion/projects/BiNGS/bings_omics/data/bings/2023/keithsigel/analprecancer_wes/purecn/normaldb/normalDB_hg19.rds --intervals baits_hg19_intervals.txt --genome hg19

Log file of a sample with relatively higher purity and low number of alterations: 37258A.log

B-allele frequency plot of the maximum likelihood solution: image

Session Info R version 4.2.0 (2022-04-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS/LAPACK: /hpc/packages/minerva-centos7/intel/parallel_studio_xe_2019/compilers_and_libraries_2019.0.117/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached): [1] compiler_4.2.0 Please let me know if anything else is needed!

lima1 commented 5 months ago

Hi. Looks like a normal sample normalized against a PON including it. Is this possible? That would explain the large fraction of log ratios in the second panel being 0. When there are no copy alterations, the purity is unreliable. it should return the lowest purity candidate though as first solution.

Apart from that: The number of heterozygous SNPs is way too low for WES. Make sure to run Mutect with 50-75bp padding. You can also experiment with the different segmentation methods. GATK works pretty well with WES.