Closed ccruizm closed 1 month ago
The right thing to do is to run pileup_and_phase.R
jointly for all samples belonging to the same patient. Please refer to the doc for how to do this
https://kharchenkolab.github.io/numbat/articles/numbat.html#preparing-data
Renaming clashing barcodes is not necessary as the bulk aggregate is used for genotyping
Good day!
I have read that it is possible to run Numbar in several samples (#111 , #142 ). I have been trying to apply it to my data where different samples come from the same patient (different tumor areas). In this case, I run
pileup_and_phase.R
in each sample individually and then merge theallele_counts.tsv.gz
into one data frame. However, when runningrun_numbat
I getError in check_allele_df(df_allele): Inconsistent SNP genotypes; Are cells from two different individuals mixed together?
.I checked, and after merging both df_allele and running.
I indeed see that some snp_id have n=2 (which causes the error). Could you please tell me how I should use Numbat in case of multiple samples, please? Do I need to run
pileup_and_phase.R
for all the samples of interest? I have ovelarping barcodes among samples. Do I need to rename them in the BAM file? Or is there a easier way to do it that I am missing?Thanks in advance!