lima1 / PureCN

Copy number calling and variant classification using targeted short read sequencing
https://bioconductor.org/packages/devel/bioc/html/PureCN.html
Artistic License 2.0
127 stars 32 forks source link

purity and ploidy corrected CN #360

Closed jennyp76 closed 5 months ago

jennyp76 commented 6 months ago

Hi, I would really appreciate your help. Even though, I read through all issues related to ploidy and purity adjustment, I'm still confused.

  1. Based on one of the issues #40 , you mentioned that PureCN provides "ploidy AND purity-adjusted CN". So you mean the C and M in the {sample_id}_genes.csv file are "ploidy AND purity-adjusted? The ploidy value in {sample_id}.csv is calculated presuming the purity of tumor sample is 1? From #40, it would be 'T(tumor ploidy).

  2. Yet the seg.mean in the {sample_id}_dnacopy.seg, is NOT ploidy NOR purity-adjusted? Isn't the log2-ratio already been adjusted for ploidy and purity?

  3. So, if I want to use seg.mean for GISTIC2.0, I would have to calculate purity-adjusted log2 ratio by myself using the formula mentioned in https://www.nature.com/articles/ng.2760 (section Impurity-corrected GISTIC)?

I'm sorry for asking basic questions. Thanks, in advance. Jen

lima1 commented 6 months ago

Hi Jen.

PureCN indeed does not adjust the tumor vs normal log ratio. All it is doing is a bit of cleaning up and alignment of on-target and off-target (which is then the seg.mean). Given that this comes up like once a year, it might make sense to add a function here. I don't need that function, so it's not there yet! For now, if you follow #40, you should be fine. I can confirm that the formula in the paper is wrong and needs to be corrected.

Everything else, C (total copy number), M (minor copy number) are adjusted for the correct purity AND ploidy. That's basically what PureCN is doing, calculating those allele-specific absolute copy numbers.

Hope that helps, Markus

jennyp76 commented 6 months ago

Sure help me a lot. Thanks for the quick reply.

One last thing. If I plan to adjust the seg.mean which is log2 transformed, I should use the value before log2-transformation for the 'raw (observed) coverage ratio'? And then log-transform it again to get the (log2() -1 of copy number) value? Or am I wrong?

Thanks

lima1 commented 6 months ago

I just added a function adjustLogRatio() to the issue_40 branch. It has an argument is.log2 that's by default TRUE. Let me how it goes and I'll add it to the devel branch.

jennyp76 commented 6 months ago

Yeah sure that function would help a lot of people in the future.THANKS FOR YOUR WORK! Could I know the evidence for assigning the min.ratio as specific value 0.004??

lima1 commented 6 months ago

I use -8 as minimum log2 ratio in other parts. If that causes problems, let me know. Basically -8 is the smallest possible log ratio in a 99% purity sample.

jennyp76 commented 6 months ago

Okay! Thank you

lima1 commented 6 months ago

Will change the default to 2^-8. 0.004 is indeed a bit confusing.