kharchenkolab / numbat

Haplotype-aware CNV analysis from single-cell RNA-seq
156 stars 22 forks source link

Running numbat in multiple samples from same patient #176

Closed ccruizm closed 1 month ago

ccruizm commented 2 months ago

Good day!

I have read that it is possible to run Numbar in several samples (#111 , #142 ). I have been trying to apply it to my data where different samples come from the same patient (different tumor areas). In this case, I run pileup_and_phase.R in each sample individually and then merge the allele_counts.tsv.gz into one data frame. However, when running run_numbat I get Error in check_allele_df(df_allele): Inconsistent SNP genotypes; Are cells from two different individuals mixed together?.

I checked, and after merging both df_allele and running.

snps_test = df %>% 
        filter(GT != '') %>% 
        group_by(snp_id) %>%
            n = length(unique(GT))

I indeed see that some snp_id have n=2 (which causes the error). Could you please tell me how I should use Numbat in case of multiple samples, please? Do I need to run pileup_and_phase.R for all the samples of interest? I have ovelarping barcodes among samples. Do I need to rename them in the BAM file? Or is there a easier way to do it that I am missing?

Thanks in advance!

teng-gao commented 1 month ago

The right thing to do is to run pileup_and_phase.R jointly for all samples belonging to the same patient. Please refer to the doc for how to do this Renaming clashing barcodes is not necessary as the bulk aggregate is used for genotyping