lima1 / PureCN

Copy number calling and variant classification using targeted short read sequencing
https://bioconductor.org/packages/devel/bioc/html/PureCN.html
Artistic License 2.0
127 stars 32 forks source link

Error : PURECN.R #115

Closed 2yuna710 closed 4 years ago

2yuna710 commented 4 years ago

This is a log file. I don't know why I got the error message for only one sample of over 1000 samples. Please check the below error message.

INFO [2019-12-26 10:42:52] ------------------------------------------------------------ INFO [2019-12-26 10:42:52] PureCN 1.14.3 INFO [2019-12-26 10:42:52] ------------------------------------------------------------ INFO [2019-12-26 10:42:52] Loading coverage files... INFO [2019-12-26 10:42:53] Mean target coverages: 62X (tumor) 59X (normal). WARN [2019-12-26 10:42:53] Allosome coverage missing, cannot determine sex. WARN [2019-12-26 10:42:53] Allosome coverage missing, cannot determine sex. INFO [2019-12-26 10:42:54] Removing 487 intervals with missing log.ratio. INFO [2019-12-26 10:42:54] Removing 19 low/high GC targets. INFO [2019-12-26 10:42:54] Removing 3933 intervals excluded in normalDB. INFO [2019-12-26 10:42:54] normalDB provided. Setting minimum coverage for segmentation to 0.0015X. INFO [2019-12-26 10:42:54] Removing 954 low coverage (< 0.0015X) intervals. INFO [2019-12-26 10:42:54] Using 18041 intervals (4587 on-target, 13454 off-target). INFO [2019-12-26 10:42:54] Ratio of mean on-target vs. off-target read counts: 1.71 INFO [2019-12-26 10:42:54] Mean off-target bin size: 194685 INFO [2019-12-26 10:42:54] AT/GC dropout: 0.99 (tumor), 0.98 (normal). INFO [2019-12-26 10:42:54] Loading VCF... INFO [2019-12-26 10:42:54] Found 5762 variants in VCF file. INFO [2019-12-26 10:42:55] 2235 (38.8%) variants annotated as likely germline (DB INFO flag). INFO [2019-12-26 10:42:55] $sampleID is tumor in VCF file. INFO [2019-12-26 10:42:55] 2 homozygous and 28 heterozygous variants on chrX. INFO [2019-12-26 10:42:55] Sex from VCF: F (Fisher's p-value: 0.422, odds-ratio: 0.43). INFO [2019-12-26 10:42:56] Initial testing for significant sample cross-contamination: maybe INFO [2019-12-26 10:42:56] Removing 2062 variants with AF < 0.030 or AF >= 0.970 or less than 3 supporting reads or depth < 15. INFO [2019-12-26 10:42:56] Removing 0 low quality variants with BQ < 25. INFO [2019-12-26 10:42:56] Total size of targeted genomic region: 0.99Mb (1.38Mb with 50bp padding). INFO [2019-12-26 10:42:57] 34.3% of targets contain variants. INFO [2019-12-26 10:42:57] Removing 1405 variants outside intervals. WARN [2019-12-26 10:42:57] Less than half of variants in dbSNP. Make sure that VCF contains both germline and somatic variants. INFO [2019-12-26 10:42:57] Setting somatic prior probabilities for dbSNP hits to 0.000500 or to 0.500000 otherwise. INFO [2019-12-26 10:42:57] VCF does not contain somatic status. For best results, consider providing normal.panel.vcf.file when matched normals are not available. INFO [2019-12-26 10:42:57] Excluding 1424 novel or poor quality variants from segmentation. INFO [2019-12-26 10:42:57] Sample sex: ? INFO [2019-12-26 10:42:57] Segmenting data... INFO [2019-12-26 10:42:57] Loading pre-computed boundaries for DNAcopy... INFO [2019-12-26 10:42:57] Setting undo.SD parameter to 1.250000. INFO [2019-12-26 10:42:59] Setting prune.hclust.h parameter to 0.100000. INFO [2019-12-26 10:42:59] Found 56 segments with median size of 29.56Mb. INFO [2019-12-26 10:42:59] Removing 2 variants outside segments. INFO [2019-12-26 10:42:59] Using 2293 variants. INFO [2019-12-26 10:42:59] Mean standard deviation of log-ratios: 0.92 INFO [2019-12-26 10:42:59] 2D-grid search of purity and ploidy... FATAL [2019-12-26 10:43:34] Cannot find valid purity/ploidy solution. This happens when input

FATAL [2019-12-26 10:43:34] segmentations are garbage, most likely due to a catastrophic sample QC

FATAL [2019-12-26 10:43:34] failure. Re-check standard QC metrics for this sample.

FATAL [2019-12-26 10:43:34]

FATAL [2019-12-26 10:43:34] This is most likely a user error due to invalid input data or

FATAL [2019-12-26 10:43:34] parameters (PureCN 1.14.3).

lima1 commented 4 years ago

Hi, the log-ratio standard deviation of 0.92 is extremely high. High quality samples have less than 0.2. Did you check quality metrics in Picard for this sample? Any red flags?

Also: consider upgrading to 1.16. Have a look at https://bioconductor.org/packages/devel/bioc/vignettes/PureCN/inst/doc/Quick.html how to use the PSCBS segmentation for slightly better results, especially in noisy samples.

lima1 commented 4 years ago

Closing for now, feel free to open a new issue or re-open if you think there is an issue with PureCN.