Closed lmanchon closed 11 months ago
It's not working automatically? Internally it will create sex-specific PoNs. Not sure I ever tested this thoroughly, but I expect it to work.
it's strange i have no result on X and Y, PureCN detect anomalies on autosomal chromosomes but not on X and Y, do i set some parameter to recover anomalies on chrX and chr Y ?
Ah, got it. Yes, the likelihood model only supports diploid genomes, so it ignores X/Y for males.
Keeping the X/Y segmentation is a fair feature request though. Not sure I find time for it soon.
You might be able to force the use of X/Y by providing sex="diploid". Not sure there are any side effects though. But might be enough to get the segmentation and merge it then back to the regular run.
is NormalDB.R script accept the parameter sex="diploid" ?
PureCN.R.
oh yes of course, sorry, I confused normalDb with PureCN
and i have another question because i obtain too much noisy results: is there a parameter to adjust (such as --drop-low-coverage (per bin) in cnvkit) to reduce the noisy background ?
my command was: Rscript $PURECN/PureCN.R --out $OUTPUTS/$lib --tumor $OUTPUTS/${lib}"_recalibrated_coverage_loess.txt.gz" --sampleid $lib --vcf $OUTPUT_GATK_CNV/${lib}"_Mutect2.vcf.gz" --stats-file $OUTPUT_GATK_CNV/${lib}"_Mutect2.vcf.gz.stats" --fun-segmentation PSCBS --sex=diploid --cores=16 --normaldb $OUT_REF/normalDB_hg38.rds --mapping-bias-file $OUT_REF/mapping_bias_TWIST_hg38.rds --intervals $OUT_REF/BED_hg38_intervals.txt --genome hg38 --model betabin --post-optimize --seed 123
Can you post the log file? It will tell you exactly what it used for filtering low quality bins.
The --stats-file is only for Mutect1. Also have a look at #320 if you have many MBQ 20 variants.
PSX-AD-001.log here attachment log file
Looks fine, but you indeed suffer the same issue with MBQ 20. Install the version from the issue_320 branch: BiocManager::install("lima1/PureCN", ref = "issue_320")
And then re-run PureCN.R with --min-base-quality 20
ok i have installed the version from issue_320
Rscript $PURECN/PureCN.R -v 2.7.11
then: Rscript $PURECN/PureCN.R --out $OUTPUTS/$lib --tumor $OUTPUTS/${lib}"_recalibrated_coverage_loess.txt.gz" --sampleid $lib --vcf $OUTPUT_GATK_CNV/${lib}"_Mutect2.vcf.gz" --fun-segmentation PSCBS --sex=diploid --cores=16 --min-base-quality 20 --normaldb $OUT_REF/normalDB_TWIST_hg38.rds --mapping-bias-file $OUT_REF/mapping_bias_TWIST_hg38.rds --intervals $OUT_REF/DIGI_BED_hg38_intervals.txt --genome hg38 --model betabin --post-optimize --seed 123
and same output with the same number of CNVs.
You should have now a much higher number of SNPs in the log file.
INFO [2023-09-19 16:42:29] 2.0% of targets contain variants.
That should be now > 10%.
That number should have dropped significantly:
INFO [2023-09-19 16:42:28] Removing 14110 low quality variants with non-offset BQ < 25.
ok, yes i understand. I have used PureCN and CNVkit, the 2 tools give me approximatively the same number of variants ~ 140 but when i use GATK, i obtain only 40 CNVs, GATK seems to be more stringent in calling variants, and it's not easy to deduce the best results.
You can use the GATK segmentation with PureCN via --fun-segmentation GATK. That should get you close to GATK, but with off-target read support.
PSCBS has a few unique features to combine off-target and on-target reads a bit smarter in the segmentation though. For low purity, I recommend PSCBS, for higher purity (like usually > 30%), GATK is probably cleaner.
It's a balance of sensitivity and specificity. CNVkit is tweaked for sensitivity and expected to give more segments.
yes. maybe by adding another caller I could make a difference: XHMM, Codex2 ?
Describe the issue build PON from multiples samples, all samples are males
To Reproduce Rscript $PURECN/Coverage.R --out-dir $OUTPUTS/normals --bam normals.list --intervals $OUT_REF/hg38_intervals.txt --cores 12 --force ls -a $OUTPUTS/normals/*_loess.txt.gz | cat > PON_normal_coverages.list Rscript $PURECN/NormalDB.R --out-dir $OUT_REF --coverage-files PON_normal_coverages.list --genome hg38 --assay PON --force
How to set up sex male if all my normal samples are males ? I know that in cnvkit tool i can set --male-reference for male when i build reference (PON).
thank you--