kharchenkolab / numbat

Haplotype-aware CNV analysis from single-cell RNA-seq
https://kharchenkolab.github.io/numbat/
Other
156 stars 22 forks source link

Running numbat in multiple samples from same patient #176

Closed ccruizm closed 1 month ago

ccruizm commented 2 months ago

Good day!

I have read that it is possible to run Numbar in several samples (#111 , #142 ). I have been trying to apply it to my data where different samples come from the same patient (different tumor areas). In this case, I run pileup_and_phase.R in each sample individually and then merge the allele_counts.tsv.gz into one data frame. However, when running run_numbat I get Error in check_allele_df(df_allele): Inconsistent SNP genotypes; Are cells from two different individuals mixed together?.

I checked, and after merging both df_allele and running.

snps_test = df %>% 
        filter(GT != '') %>% 
        group_by(snp_id) %>%
        summarise(
            n = length(unique(GT))
        )

I indeed see that some snp_id have n=2 (which causes the error). Could you please tell me how I should use Numbat in case of multiple samples, please? Do I need to run pileup_and_phase.R for all the samples of interest? I have ovelarping barcodes among samples. Do I need to rename them in the BAM file? Or is there a easier way to do it that I am missing?

Thanks in advance!

teng-gao commented 1 month ago

The right thing to do is to run pileup_and_phase.R jointly for all samples belonging to the same patient. Please refer to the doc for how to do this https://kharchenkolab.github.io/numbat/articles/numbat.html#preparing-data Renaming clashing barcodes is not necessary as the bulk aggregate is used for genotyping