lima1 / PureCN

Copy number calling and variant classification using targeted short read sequencing
https://bioconductor.org/packages/devel/bioc/html/PureCN.html
Artistic License 2.0
127 stars 32 forks source link

Normal tissue vs. normal blood #154

Closed djb17 closed 3 years ago

djb17 commented 3 years ago

Hello again,

I was testing PureCN on TCGA samples and noticed some glaring difference in tumor purity that was previously reported in this paper (figures 1, 2; supp. figures 2, 3). In short, ovarian cohort I examined via PureCN was roughly around 40% purity whereas previously reported consensus was around 80-90%.

To provide a little bit of detail, I used matching blood normal samples to generate coverages, making sure kits are consistent. I ran the recommended production pipeline (without providing normal coverage).

Now I'm wondering if I should be generating the coverages using the tissue normal to see if changing the normal reference significantly affects tumor purity estimation.

Thank you.

lima1 commented 3 years ago

Hi, we ran PureCN on OVC here: https://ascopubs.org/doi/suppl/10.1200/CCI.19.00130

I think there is a Supplemental Table with our purities and ploidies. Does it look very different from what you get?

djb17 commented 3 years ago

Hi Markus. Quick as usual! I still have a handful of samples running, but these numbers seem very close to what I'm getting.

From what I skimmed in the paper that I mentioned, they employ 4 different approaches and attempt to normalize purities to come up with a consensus value. I guess I'll have to look closely in their methods, but it seems odd that our estimation is way off compared to what they reported.

lima1 commented 3 years ago

Table S1 lists the values from ABSOLUTE and FACETS as well. You can also check for the TP53 somatic mutation that all of them have. 40% purity for OVC should be on the lower tail of all samples, so they might be right. Note that ABSOLUTE is from SNP6 and those tissue slides could have a slightly different purity.

If you post a screenshot of the B-allele frequency plot and the output of the log file, I can easily double check.

To your initial question: use as many normal samples (i.e. samples without somatic copy number alterations) as you can get. With tissue normal I assume you mean the matched normal? I don't think TCGA provides a lot of adjacent normals, most should be blood. Blood is fine, usually better quality. Might miss some tissue/FFPE specific noise, but not much you can do.

Providing the matched normal via --normal should almost always produce way worse results than using the normal database.

djb17 commented 3 years ago

Below is the figure from the paper I mentioned in case you're interested. They reported the mean consensus purity for OVC around 90% which is why I brought this matter up.

aran_et_al_2015_fig1b

lima1 commented 3 years ago

Yes, this is expected and PureCN should give you high values for most samples. Have a look at Table S1 in our paper. Here the corresponding figure.

lima1 commented 3 years ago

I might have misunderstood. I thought you ran the PureCN on TCGA samples and got around 40%. You mean your own cohort is much lower? It’s usually obvious in the B-allele frequency plot if the maximum likelihood solution is correct. Feel free to post one where you are unsure. Also maybe check the Tp53 allele frequency.