Closed nithishak closed 2 years ago
Hi Nithisha,
yes, something looks wrong here. But unfortunately nothing obvious.
Regarding off-target, simply load an example coverage file with --offtarget in R (PureCN::readCoverageFile). If there are no or not many reads, but you think there should be any, make sure the BAM file is ok and not filtered for on-target only. All PureCN is doing is filling the gaps between baits in IntervalFile.R with additional intervals. Coverage.R is then adding any reads in those gaps. Try both the coverage.txt.gz and the corresponding coverage_loess.txt.gz.
GC dropout:
INFO [2022-07-24 19:10:34] AT/GC dropout: 1.07 (tumor), 1.03 (normal), 1.05 (coverage log-ratio).
Might be indeed pointing to an issue. The important one is the log-ratio one, that one should be very close to 1.0, meaning that tumor and normal have identical GC bias.
Double check that there are no obvious tumor/normal swaps (unlikely, but worth checking). Then try using it without explicit GC-normalization. So instead of using coverage_loess.txt.gz, use the original coverage.txt.gz in both NormalDB.R and PureCN.R. That hopefully brings the log-ratio GC-bias to 1 (the other 2 are likely higher, but that's ok, the important part is that the log-ratio has no bias anymore).
INFO [2022-07-24 19:11:12] Mean standard deviation of log-ratios: 0.33 (MAPD: 0.23).
That hopefully also gets down to something closer to 0.15 if those are young FFPE or fresh frozen samples.
If this did not fix it, can you post the B-allele frequency/coverage plots of both PureCN and Sequenza?
Hello Dr. Riester,
Thank you. From your comments, I can see that my first error might have been that I am using a bam that has been filtered for on-target only.
I have run my workflow again and confirmed through using PureCN::readCoverageFile, that when I use --off-target, I see a lot of 0s for 23,851 positions. This is probably why my Step 7 also fails. When I do not use --off-target, I do not see as many 0s for 8056 positions. This runs to completion and produces consistent cellularity and ploidy estimates.
I would like to edit my workflow and use the unfiltered bam to see how the results differ but had a few follow-up questions.
Thank you!
gatk GenomicsDBImport -R /db/ngdx/references/hg38/Reference/PATCHED/sequence_u2af1_fix.v1.2020_04_01.fa -L files/gatk4_hm2_hg38.preprocessed.interval_list --interval-padding 200 --merge-input-intervals --genomicsdb-workspace-path files/gatk4_m2_hm2_pon_db -V sample_lists/normal_vcfs_gatk4_hm2_2022-01-10.list
Hope that helps, Markus
Describe the issue When I run PureCN on 10 tumor samples and use their paired 10 germline samples to create a reference, I get different cellularity and ploidy results from Sequenza. Moreover, I can only run the script if I exclude --off-targets parameter in Step 1 below. I am hoping to get some advice on whether my workflow is correct. I followed the steps from the vignette.
To Reproduce
Step 1 - Generating interval files
Step 2 - Run mutect on 10 germline samples, create genomics db and use CreateSomaticPanelOfNormals
Step 3 - Run Mutect on unmatched tumor samples
Step 4 - calculate gc normalized coverages for germlines
Step 5 - calculate gc normalized coverages for tumors
Step 6 - create normal db
Step 7 - calculate tumor cellularity and ploidy
Expected behavior I thought I might get the same cellularity and ploidy results as I did from Sequenza but the results are significantly different. I might need to alter the workflow as the “HIGH AT- OR GC-DROPOUT” flags for PureCN results don’t reflect what is seen in lab QC results (samples do not show high dropout). Also, the off target% for our targeted capture assay is around 30% and so I expect the --off-targets to be included in Step 1 but it gives an error as shown below.
Log file If I include --off-targets parameter in Step 1, I get this error in Step 6.
Here is some information from the logs of one of the tumor sample from Step 7.