hsinnan75 / MapCaller

MapCaller – An efficient and versatile approach for short-read alignment and variant detection in high-throughput sequenced genomes
MIT License
29 stars 5 forks source link

Gaps END is 1 bp too long? #63

Closed tseemann closed 4 years ago

tseemann commented 4 years ago

Should END=39 be END=38 ? Because that is the last position that has depth 0 (zero) ?

# MapCaller v0.9.9.37
% MapCaller -r ref.fa -f R1.fq.gz -f2 R2.fq.gz -min_gap 1 -bam >(samtools sort > output.bam) 

% grep Gaps output.vcf

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  unknown
MN908947        1       .       A       <*>     0       Gaps    END=39  GT:GQ:DP:AD     *:*:*:*

% samtools depth -aa output.bam

MN908947        1       0
MN908947        2       0
MN908947        3       0
MN908947        4       0
<snip>
MN908947        35      0
MN908947        36      0
MN908947        37      0
MN908947        38      0
MN908947        39      2
MN908947        40      3
hsinnan75 commented 4 years ago

Sorry for the late response. I'll check the codes.

tseemann commented 4 years ago

Thank you.

tseemann commented 4 years ago

@hsinnan75 i hope you are well in these times! do you think you will ever be able to work on MapCaller again?

hsinnan75 commented 4 years ago

@tseemann Sorry, I have been very busy and forgotten to check the codes. I'll do it this weekend.

hsinnan75 commented 4 years ago

@tseemann I found a bug in reporting SAM output, and I checked the vcf output with another dataset. I am not sure if the revised version fixed the bug. Please update your MapCaller to 0.9.9.38 and try again, and let me know if it works or not. Thank you very much!

tseemann commented 4 years ago

I am still getting some errors:

MN908947.3 5319 . A <*> 0 Gaps END=5356 GT:GQ:DP:AD *:*:*:*

MN908947.3      5314    349
MN908947.3      5315    6
MN908947.3      5316    2
MN908947.3      5317    2
MN908947.3      5318    2
MN908947.3      5319    1   BEGIN IS WRONG
MN908947.3      5320    1
MN908947.3      5321    0  <--- SHOULD BEGIN HERE?
MN908947.3      5322    0
MN908947.3      5323    0
MN908947.3      5324    0
....
MN908947.3      5356    0   END IS CORRECT
MN908947.3      5357    1
MN908947.3      5358    1
hsinnan75 commented 4 years ago

Could you give me your read files, so that I can identify the bugs accordingly, thank you!

tseemann commented 4 years ago

On NCBI the reads are: https://www.ncbi.nlm.nih.gov/biosample/SAMN14422706/ Reads https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR11397719 BUT they have been trimmed by ivar trim first...

hsinnan75 commented 4 years ago

@tseemann I couldn't repeat the issue with the data set. I did not use ivar trim since I don't have the bed file to run ivar trim. Here is the output.

MT450923.1 5265 . A <*> 0 Gaps END=5515

MT450923.1 5263 2 MT450923.1 5264 2 MT450923.1 5265 0 MT450923.1 5266 0 MT450923.1 5267 0 MT450923.1 5268 0 MT450923.1 5269 0 MT450923.1 5270 0 .... MT450923.1 5513 0 MT450923.1 5514 0 MT450923.1 5515 0 MT450923.1 5516 207 MT450923.1 5517 212

The result matches the vcf entry that begins at 5265 and ends at 5515. Please show me how to download the primers.bed file. Thank you!

tseemann commented 4 years ago

Actually thar won't help because ivar trim only uses SOFT clipping in the BAM. It doesn't trim any actual FASTQ files. I am confused now.

hsinnan75 commented 4 years ago

If ivar trim uses SOFT clipping in the BAM, that means it would modify the original read alignments and the output of samtools depth would be affected by the modification. Maybe that's why you got different stats between MapCaller's vcf and samtools depth.

tseemann commented 4 years ago

Yes I think you are right. Thanks for understanding.