Non-integer copy number results from PureCN

hoonghim commented 5 years ago

Dear Markus,

Hello, I'm using PureCN to analyze the purity, ploidy, and copy number variations in human pancreatic cancer/liver metastases WES data.

When I utilize PureCN results to apply for PyClone to infer clonal population structure, I realized that some of the results have non-integer copy number.

Here is an example. I just skipped the first column which indicates Sampleid

You could see that there are some non-integer values in C column.

In PureCN manual (http://bioconductor.org/packages/release/bioc/vignettes/PureCN/inst/doc/PureCN.pdf) , C represents Segment integer copy number

There is no error log when I run PureCN.R using normaldb and normal_panel mode

I have attached a log of the program progress.

run_purecn_with_normal_db_and_normal_panel.PB402-FNA-D.sh.o62523.txt

I don't know why these non-integer copy numbers appear and how I could fix the non-integer copy number.

Is it okay for me to round the non-integer copy number?

I hope I could utilize the PureCN result to infer clonal evolution of my samples.

Sincerely,

Seung-hoon

lima1 commented 5 years ago

Hi Seung-hoon,

non-integer copy number happen for high-level copy number amplifications or sub-clonal gains and losses. The latter are conservatively called, so in lower coverage WES you will rarely see them.

The reason high-level amplifications are reported like that is simply that PureCN checks all copy states from 0 to max.copy.number (by default 7). Everything that's exceeds this number is reported by scaling the measured log-ratio for purity and ploidy.

Since high-level amplifications are usually very small, it probably won't matter and rounding should be fine. I think it would be safest to simply exclude all variants in segments with non-integer value.

Markus

hoonghim commented 5 years ago

Hi Seung-hoon,

non-integer copy number happen for high-level copy number amplifications or sub-clonal gains and losses. The latter are conservatively called, so in lower coverage WES you will rarely see them.

The reason high-level amplifications are reported like that is simply that PureCN checks all copy states from 0 to max.copy.number (by default 7). Everything that's exceeds this number is reported by scaling the measured log-ratio for purity and ploidy.

Since high-level amplifications are usually very small, it probably won't matter and rounding should be fine. I think it would be safest to simply exclude all variants in segments with non-integer value.

Markus

Dear Markus,

Thank you very much for your quick answer.

Now I understand why non-integer copy number happens.

I think I have to take a closer look at high-level amplification regions because it could be related to the tumorigenesis or metastases.

I also run sequenza to check copy number state and tumor purity, thus I could compare the result for high-amplification regions.

Again, sincere thanks for your advice.

Sincerely,

Seung-hoon

lima1 commented 5 years ago

If after checking you think these segments are likely artifacts, you can try using the PSCBS segmentation:

# patched PSCBS with support of interval weights
BiocManager::install("lima1/PSCBS", ref="add_dnacopy_weighting")

Then simply add --funsegmentation PSCBS to the PureCN.R call. PSCBS should give improvements especially in lower coverage WES.

lima1 / PureCN

Non-integer copy number results from PureCN #103