statgen / demuxlet

Genetic multiplexing of barcoded single cell RNA-seq
Apache License 2.0
121 stars 26 forks source link

discrepancies in demuxlet if done with / without the pileup #111

Closed Marwansha closed 4 months ago

Marwansha commented 4 months ago

Hi,

i am observing differences in the output of demuxlet using the exact same input, depending if i do the pileup first or directly run in on the bam file/ using the same vcf file

while overall they agree on 90% of the calling, overall the LLK is different for all cells even those with same calls, but i am trying to understand this 10% different calls which one is more trustable? any ideas?

here you can see those cells that differs, where the pileup call all those as SNG while runing demuxlet specify all those as DBL

image same thing in other direction demuxlet without pileup called it SNG while with pileup is DBL image

comparing to the overall disruption of LLK differnces , those are borderline cells, low values of diff_sng-DBL llk, but its still 10 % of the cells, so overall the choice will have a huge IMPACT

best Marwan

hyunminkang commented 4 months ago

This is not surprising. There are small differences in the default parameters to filter reads in the two different versions. I believe that what pileup version is doing is more appropriate.