kharchenkolab / numbat

Haplotype-aware CNV analysis from single-cell RNA-seq
https://kharchenkolab.github.io/numbat/
Other
163 stars 23 forks source link

how to specify the number of clones in the bulk_clones_final.png and look at the allele frequency profile based on the specified number of clones #133

Open rosaranli opened 1 year ago

rosaranli commented 1 year ago

Thank you for this wonderful tool. It's great to have logFC and allele frequency profile in the same plot as previously I have to generate them by myself with lots of efforts. It has lots of advantage to call CNVs, especially copy-neural loss of heterozygosity.

When I look at the bulk_clones_final.png, for some of the samples, especially for samples that have relatively low coverage, the number of clones are higher than expected. It looks like some of the clones have very similar CNV profiles, except for one chromosome. Please see the figure below. For example, chr5p, clone2 and 3 have copy gain and clone 4 doesn't but it looks like the expression is higher in clone 4 for the same region. There are other examples might be more appropriate but I don't have access to these plots now as our server is down now. I am wondering if it is possible to specify the number of clones shown in the bulk_clones_final.png or how to adjust the parameters if I want to reduce these false positives so it would have fewer number of clones in bulk_clones_final.png ? I understand there is a parameter n_cut to specify the number of genotypes but that's not what I need as I want to look at the allele frequency profiles and logFC for the clones as well. Many thanks!

Picture1

teng-gao commented 1 year ago

Hi,

Thanks for the question. The easiest to achieve this is to fix n_cut in run_numbat (for example, for the above sample you can set n_cut = 1, and you should get only 2 clones in bulk_clones_final.png). Let me know if this works for you.

rosaranli commented 1 year ago

Hi Teng, Thank you very much for your suggestion. n_cut=1 did force the number of clones to be 2 so the bulk_clones_final.png changed from the following plot bulk_clones_final to below bulk_clones_final In the later version of the analysis where there are two clones, it looks like the two clones share exactly the same copy number changes. Can you suggest why it's like that? I would assume clones were defined by different copy number changes. I noticed n_cut=0 is the default parameter for run_numbat but it may give different number of clones for different samples. Would it be possible to force it to get just one clone so I could have a look at the overall cnv and allele frequency profile? If so, how do I specify the parameters?

If the default n_cut=0 was used but I still want to get fewer number of clones, does that mean I need to change other parameters to make the detection of CNV more stringent? for example, changing the transition probability t smaller? changing max_entropy bigger? please correct me if I am wrong. Are there any other parameters to play with to reduce the number of clones?

I am sorry for so many questions and thank you very much in advance.

teng-gao commented 1 year ago

Interesting, could you share more output files when you set n_cut = 1 (feel free to email; tgaoteng@gmail)? I would have expected n_cut = 1 to result in 1 normal and 1 tumor clone.

To just perform a pseudobulk analysis of all (or selected subset of) cells, you can follow the steps of get_bulk and analyze_bulk and visualize using plot_psbulk.

rosaranli commented 1 year ago

Hi Teng, thank you. I only took the epithelial cells (using information from other analysis we know they should be all tumour cells) so I would assume there are no normal cells but just tumour cells. I did this for two reasons: number of cells from some samples are really high and it was really slow to run; or the fraction of tumour cells is really low so I thought using the epithelial cells only (potential tumour cells) would be more accurate. Please correct me if I am wrong. I'll send more files to you by email. Many thanks.

teng-gao commented 1 year ago

I see. I took a look at your results and it seems that the split is due to poor ploidy fit (the tumor has no diploid region). I would follow these steps to perform pseudobulk analysis.

To just perform a pseudobulk analysis of all (or selected subset of) cells, you can follow the steps of get_bulk and analyze_bulk and visualize using plot_psbulk.

rosaranli commented 1 year ago

Thank you for looking at this. If this is the case, do you recommend to always have some normal cells in the clone detection? To identify clones automatically, are there any parameters I could change to reduce the number of clones to avoid false positives?

rosaranli commented 1 year ago

Sorry another basic question for clone detection: would the method distinguish the two clones that lost different copies of the same chromosomes? For example, one clone have lost paternal copy of chr1 while another clone have lost maternal copy of chr1?

teng-gao commented 1 year ago

Thank you for looking at this. If this is the case, do you recommend to always have some normal cells in the clone detection? To identify clones automatically, are there any parameters I could change to reduce the number of clones to avoid false positives?

You can consider increasing tau to get fewer clones in general. It is not necessary to include normal cells (except for the genotyping step).

Sorry another basic question for clone detection: would the method distinguish the two clones that lost different copies of the same chromosomes? For example, one clone have lost paternal copy of chr1 while another clone have lost maternal copy of chr1?

Numbat does not explicitly allow this in the single-cell CNV testing and phylogeny model, even with multi_allelic mode. However, the pseudobulk analysis part of Numbat should be able to pick up mirrored CNV events (see Fig. 4 of the paper).