stschiff / sequenceTools

Other
39 stars 10 forks source link

PileupCaller calls "missing" when there should be sufficient reads #25

Closed stschiff closed 2 years ago

stschiff commented 2 years ago

(Issue opened on behalf of Pierre Luisi via email)

I am using PileupCaller to call pseudo-haploid genotypes for my data. I found out that for high quality reads mapping to the reference allele pileupcaller retruen a 9 (missing values). For instance, when the output of samtools mpileup is the following (in red what is not called by PileupCaller),

chr6    33924896        a      1       .       J       1       ,       J       3       .,.     JJJ     2       Cc      JJ      6       .,,,,.  FJJJJJ  1  , J

the output for pileupcaller is

999099

I've tried also to change the ASCII-encoded quality score in the output from the samtools mpileup before feeding it to PileupCaller, it doesn't resolve the issue. I can't find what I am doing inaccurately with PileupCaller.

The command I use are:

samtools mpileup \
-B -q $q -Q $Q -l $BED -f $fa \
| pileupCaller \
--randomHaploid \
--sampleNames $samplesList \
-f $snp \
-e MyPileUp.Q$Q.q$q

the line for the snp in the file $snp I feed is

rs9688730       chr6    0.52147 33924896        A       C

what matches the mpileup output. Thanks for your help and thanks for the tool!

stschiff commented 2 years ago

This turns out to be an issue in sequence-formats, and I've opened a related issue there: https://github.com/stschiff/sequence-formats/issues/5

stschiff commented 2 years ago

This is fixed now in v1.5.2