Closed npatel-ah closed 4 years ago
Not sure. Hclust is the best segmentation method when you provide a segmentation file. It should be ignored automatically, but can you try removing chrX from the segmentation? PureCN assumes that males are normalized with male references and females with female references (the PureCN internal normalization/segmentation does this automatically). You can also try setting --sex M. This should ignore chrX.
Do you see this line in all samples:
INFO [2020-06-22 18:32:17] Ratio of mean on-target vs. off-target read counts: NaN
Thanks for the quick response. Removed chrX from segmentation file, set --sex M but still doesn't work.
pureCN assumes that males are normalized with male references and females with female references (the PureCN internal normalization/segmentation does this automatically).
Not sure if CNVkit does that, I think I will try generating segmentation with PureCN, hope that solves the issues. I am curious about the failed quality triggering this behavior. I looked through many of PureCN's issues on Github and it seems log ratio is the culprit but it doesn't seem to be the case here, the mean SD for log-ratio for the sample is very similar to many of other samples. Even the scatter plot from CNVkit seems quite clean. Do you agree?
Do you see this line in all samples:
INFO [2020-06-22 18:32:17] Ratio of mean on-target vs. off-target read counts: NaN
Yes, this is the case for all of my tumor only samples.
Hmmm, can you create a minimal example to reproduce? Like only the CNVkit output, no VCF or mapping bias. Does it still crash? If yes, can you share this minimal example?
Hello Markus,
So I tried your suggestion of running it with just the segmentation files and ended up getting the same error. Then I managed to run internal segmentation along with NormalDb and other steps and certainly planning to continue with it. I also tried PSCBS method without any issue.
If you still like to troubleshoot, the CNVkit issue, I have attached cns and seg files from cnvkit for the sample.
Sample1_CNVkit.seg.txt Sample1_CNVkit.cns.txt
thanks for all of your help and let me know if you need more information.
Best, Nihir
Great. The concordance between cnvkit and PureCN looks good, otherwise?
Thanks for sharing the files, will look into it.
Looks like the issue were 17 intervals with very small log2-ratio of about -15. Not sure how this happens in CNVkit, but ignoring them makes it run through.
Great. The concordance between cnvkit and PureCN looks good, otherwise?
Thanks for sharing the files, will look into it.
Yup, the concordance was great but I do believe PureCN did perform better. There were three samples out of 17 which showed Purity > 0.5 which seemed high as compare to rest which had purity between 0.15-0.25. When analyzed with PureCN, 2 out of the 3 samples were predicted to have purity around 0.2, which was expected behavior. The remaining sample had some contamination so I am not surprised that Purity is off.
Thanks a lot for the detective work on CNVkit. Just to clarify for others, when you said small log2-ratio , that's in CNVkit.cns file correct? Because Anything below -8 should be ignored from "CNVkit.seg" by PureCN.
Thanks, Nihir
Yes, exactly, in the --tumor file (I think the proper file suffix is *.cnr though).
Great, if you are unsure about the discordant samples, feel free to post the B-allele plot. 0.2 vs 0.5 is a pretty dramatic difference and should be obvious who is right. But you probably figured that out already.
Hello,
For many samples, PureCN worked well but for one of the samples, it's throwing error. I have attached log file here Sample_1-DNA.PureCN2.log
This is a tumor only sample for panel of ~400 genes.
The segmentation seems fine from CNVkit's plot, also attached. Scatter.pdf
I also tried to use PSCBS algorithm but got below error.
What can be done to obtain Plodiy/Purity estimation?
Thanks, Nihir