harbourlab / uphyloplot2

Draw phylogenetic trees of tumor evolution
67 stars 24 forks source link

Mapping subclones loss/gains #16

Open angkoomin opened 2 years ago

angkoomin commented 2 years ago

Hi there,

I was wondering how do I manually curate the loss/gains from the HMM.pred_cnv_regions.dat file with the cell_groupings file? How do I match each branches of the phylogenetic tree?

image

Thanks

stasvolik commented 2 years ago

I am very interested in this problem as well.

angkoomin commented 2 years ago

Unfortunately, I'm still asking around about how to map the subclones. Please feel free to share your thoughts if there are any as I'm still stuck for the past month with this issue.

Silvia-Bio commented 1 year ago

Check out the "inferCNV-postprocess.r.txt" script here. You might have to adjust a few things but it can be used to achieve what you asked.

starfallin commented 11 months ago

Check out the "inferCNV-postprocess.r.txt" script here. You might have to adjust a few things but it can be used to achieve what you asked.

It really works! Thank you so much. Could you please tell me why you choose state 4 as cut off when determine the gain or loss event during creation of final_cnv file? And I notice usually the authors use LOH instead of loss to describe the cnv event. Why? Sincerely

Silvia-Bio commented 11 months ago

Hi @starfallin,

I'm glad you found the script useful! I am not sure I'll be able to answer your questions correctly, but I will try.

Infercnv doesn't provide exact copy numbers, instead it classifies genomic windows into categories (neutral, gains or loss), which are then given a numerical 6-State model. A diploid state is represented by State 3, and gains are represented by States higher than 4 depending on the number of copies gained. So, in the "inferCNV-postprocess.r.txt" script, this line "final_cnv$event = ifelse(final_cnv$state >=4, "gain", "loss")" is designed to categorise genomic regions with a State of 4 or higher as gains, and anything less as a loss.

About your second question, the term "LOH" specifically refers to a situation where one of two alleles at a locus is lost. When discussing CNV events in the context of inferCNV, which works with RNA data and provides categorical states like I mentioned above, the term "loss" might be more appropriate because it broadly denotes a decrease in copy number. Using "LOH" might not be accurate in this context, especially given the inherent limitations and inaccuracies of CNV calling from RNA data compared to DNA-based methods.

Hope something here helps!

KR,

S.

rstagnit commented 2 months ago

HI @Silvia-Bio,

I appreciate you sharing your script. It has worked as anticipated until the following step: cell_group = read.csv("12_HMM_preds.cell_groupings.csv", header = F)

Due to the naming convention, I assumed that to be a inferCNV output file, but I don't have that output file in my output using infercnv_1.19.0. I was wondering if you could provide an example of that file and maybe I could create one from other outputs I do have.

R