raphael-group / hatchet

HATCHet (Holistic Allele-specific Tumor Copy-number Heterogeneity) is an algorithm that infers allele and clone-specific CNAs and WGDs jointly across multiple tumor samples from the same patient, and that leverages the relationships between clones in these samples.
BSD 3-Clause "New" or "Revised" License
68 stars 32 forks source link

clusters are not separated #114

Closed asangphukieo closed 2 years ago

asangphukieo commented 2 years ago

Hi,

I tried running HATCHet with my 3 tumor samples and matched-normal. I found that the clusters are not very clear separated as in figure below image

Then, I tried adjusting -tB and -tR parameters in range of 0.01 - 0.20, but the results are the same. Could you please give me suggestion to separate the clusters?

Regards, Apiwat

simozacca commented 2 years ago

Thank your for your feedback!

Before evaluating how well separated your clusters are, I think that we need to fix a scaling issue in this 2D plot. In fact, due to the presence of few outliers the y-axis (RDR) seems to be out of scale and it does not allow us to properly see the RDR values for most of the genomic regions (which have RDR <2). Therefore, I would suggest to try to fix a maximum y-axis value to fix the out-of-scale issue and then re-evaluate the clustering: this can be achieved using the flag --ymax (please note that flags --ymin, --xmax, and --xmin are also similarly available) and for example using

--ymax 2.5

Could you please try this option and post the updated results?

asangphukieo commented 2 years ago

Hi simozacca,

Thank you for your suggestion. Yes, the graph looks really better now!

image

However, one of my samples still are not separated.

image

Please give me any suggestion to separate the clusters?

simozacca commented 2 years ago

I am afraid your clusters are simply not separated, especially in the sample at the bottom. In particular, the sequencing signals from your data indicate that tumour purity might be too low, especially in the second sample. HATCHet allows the analysis of low tumour purity samples by jointly leveraging the signal from higher-purity samples, but in this case you do not have high tumour purity samples and the tumour purity seems to be too low anyway for accurate copy number analysis.