lima1 / PureCN

Copy number calling and variant classification using targeted short read sequencing
https://bioconductor.org/packages/devel/bioc/html/PureCN.html
Artistic License 2.0
127 stars 32 forks source link

Purity reliability and the FLAGGED flag #58

Closed tedtoal closed 5 years ago

tedtoal commented 5 years ago

My samples have rather low purity, because these are gastric cancer tumors and it is typical for the tumor to be diffuse. Consequently, PureCN makes a lot of calls of purity = 0.15, and many of my samples are FLAGGED (although even those with much higher purity are flagged, not sure why). Not a single one of the samples is marked FAILED, but I believe you said somewhere in the manual that that flag is for use during manual curation.

Despite the FLAGGED samples, it appears that the CNV results are good for most of them. However, for some it looks like maybe not. I have one sample that looks very maginal, and it was called as purity 0.15. I looked at it in my own experimental algorithm to call copy number and purity, and I see that I kicked it out because it had some problem, probably too few SNVs.

I have two questions. First, do you think I might be able to lower the lowest purity threshold for PureCN and get better purity estimates for some samples? Second, is there some way I could get out of PureCN an estimate of how reliable it thinks its purity call is?

tedtoal commented 5 years ago

Sorry, I really should have posted this on the forum instead of as a bug, I wasn't paying enough attention. But it could be flagged as an enhancement suggestion.

lima1 commented 5 years ago

Good questions. I use 0.1 as minimum for cfDNA. Below that, you really need very nice clean data. Ploidy inference below 0.1 is pretty much impossible with hybrid capture data. Yes, the flagging definitely needs to be improved and is probably the number 1 request now. The most important flag is NON-ABERRANT. If you get that one, the purity estimate is completely unreliable and it is likely <5% purity. LOW PURITY basically just means have a look. If the sample is noisy (which the flag does not test), then the signal to noise ratio might be too small.

See also https://github.com/lima1/PureCN/issues/11

tedtoal commented 5 years ago

I have never seen NON-ABERRANT

ted

— Ted Toal, Postdoctoral Researcher Carvajal-Carmona Lab Dept. of Biochemistry and Molecular Medicine 4502 GBSF, One Shields Ave Davis, CA 956626 (530) 263-5986 twtoal@ucdavis.edu

On Nov 19, 2018, at 1:27 PM, M. Riester notifications@github.com wrote:

Good questions. I use 0.1 as minimum for cfDNA. Below that, you really need very nice clean data. Ploidy inference below 0.1 is pretty much impossible with hybrid capture data. Yes, the flagging definitely needs to be improved and is probably the number 1 request now. The most important flag is NON-ABERRANT. If you get that one, the purity estimate is completely unreliable and it is likely <5% purity. LOW PURITY basically just means have a look. If the sample is noisy (which the flag does not test), then the signal to noise ratio might be too small.

See also #11 https://github.com/lima1/PureCN/issues/11 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lima1/PureCN/issues/58#issuecomment-440047267, or mute the thread https://github.com/notifications/unsubscribe-auth/ABXJz5fTO8DSXuOvtQKoysO0gM0UHc5zks5uwyI1gaJpZM4Yp1hK.

lima1 commented 5 years ago

Closing it now, keep an eye on #11 for related improvements.