lima1 / PureCN

Copy number calling and variant classification using targeted short read sequencing
https://bioconductor.org/packages/devel/bioc/html/PureCN.html
Artistic License 2.0
127 stars 32 forks source link

Cannot find valid purity/ploidy solution #127

Closed npatel-ah closed 4 years ago

npatel-ah commented 4 years ago

Hello,

For many samples, PureCN worked well but for one of the samples, it's throwing error. I have attached log file here Sample_1-DNA.PureCN2.log

This is a tumor only sample for panel of ~400 genes.

The segmentation seems fine from CNVkit's plot, also attached. Scatter.pdf

I also tried to use PSCBS algorithm but got below error.

INFO [2020-06-22 19:11:50] Re-centering provided segment means (offset -0.0535).
INFO [2020-06-22 19:11:50] Using unweighted PSCBS.
INFO [2020-06-22 19:11:50] Setting undo.SD parameter to 0.750000.
INFO [2020-06-22 19:12:06] Setting prune.hclust.h parameter to 0.150000.
Error in hclust(dist(dx), method = method) :
  NA/NaN/Inf in foreign function call (arg 10)
Calls: runAbsoluteCN -> do.call -> <Anonymous> -> .pruneByHclust -> hclust

What can be done to obtain Plodiy/Purity estimation?

Thanks, Nihir

lima1 commented 4 years ago

Not sure. Hclust is the best segmentation method when you provide a segmentation file. It should be ignored automatically, but can you try removing chrX from the segmentation? PureCN assumes that males are normalized with male references and females with female references (the PureCN internal normalization/segmentation does this automatically). You can also try setting --sex M. This should ignore chrX.

Do you see this line in all samples: INFO [2020-06-22 18:32:17] Ratio of mean on-target vs. off-target read counts: NaN

npatel-ah commented 4 years ago

Thanks for the quick response. Removed chrX from segmentation file, set --sex M but still doesn't work.

pureCN assumes that males are normalized with male references and females with female references (the PureCN internal normalization/segmentation does this automatically). Not sure if CNVkit does that, I think I will try generating segmentation with PureCN, hope that solves the issues. I am curious about the failed quality triggering this behavior. I looked through many of PureCN's issues on Github and it seems log ratio is the culprit but it doesn't seem to be the case here, the mean SD for log-ratio for the sample is very similar to many of other samples. Even the scatter plot from CNVkit seems quite clean. Do you agree?

Do you see this line in all samples:
INFO [2020-06-22 18:32:17] Ratio of mean on-target vs. off-target read counts: NaN

Yes, this is the case for all of my tumor only samples.

lima1 commented 4 years ago

Hmmm, can you create a minimal example to reproduce? Like only the CNVkit output, no VCF or mapping bias. Does it still crash? If yes, can you share this minimal example?

npatel-ah commented 4 years ago

Hello Markus,

So I tried your suggestion of running it with just the segmentation files and ended up getting the same error. Then I managed to run internal segmentation along with NormalDb and other steps and certainly planning to continue with it. I also tried PSCBS method without any issue.

If you still like to troubleshoot, the CNVkit issue, I have attached cns and seg files from cnvkit for the sample.

Sample1_CNVkit.seg.txt Sample1_CNVkit.cns.txt

thanks for all of your help and let me know if you need more information.

Best, Nihir

lima1 commented 4 years ago

Great. The concordance between cnvkit and PureCN looks good, otherwise?

Thanks for sharing the files, will look into it.

lima1 commented 4 years ago

Looks like the issue were 17 intervals with very small log2-ratio of about -15. Not sure how this happens in CNVkit, but ignoring them makes it run through.

npatel-ah commented 4 years ago

Great. The concordance between cnvkit and PureCN looks good, otherwise?

Thanks for sharing the files, will look into it.

Yup, the concordance was great but I do believe PureCN did perform better. There were three samples out of 17 which showed Purity > 0.5 which seemed high as compare to rest which had purity between 0.15-0.25. When analyzed with PureCN, 2 out of the 3 samples were predicted to have purity around 0.2, which was expected behavior. The remaining sample had some contamination so I am not surprised that Purity is off.

Thanks a lot for the detective work on CNVkit. Just to clarify for others, when you said small log2-ratio , that's in CNVkit.cns file correct? Because Anything below -8 should be ignored from "CNVkit.seg" by PureCN.

Thanks, Nihir

lima1 commented 4 years ago

Yes, exactly, in the --tumor file (I think the proper file suffix is *.cnr though).

Great, if you are unsure about the discordant samples, feel free to post the B-allele plot. 0.2 vs 0.5 is a pretty dramatic difference and should be obvious who is right. But you probably figured that out already.