isovic / graphmap

GraphMap - A highly sensitive and accurate mapper for long, error-prone reads http://www.nature.com/ncomms/2016/160415/ncomms11307/full/ncomms11307.html Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/graphmap2
MIT License
178 stars 44 forks source link

Question: graphmap align pacbio data #87

Open zwm0502 opened 6 years ago

zwm0502 commented 6 years ago

Hello, I want to ask some questions about using graphmap to align Pacbio fastq data. I use this command , graphmap align --out-fmt m5 --threads 8 -C --ordered --evalue 1e-5 --error-rate 0.15 -r ecoliMG1655.fasta -d query.fq -o out.m5 I found the detailed imformation of m5 format on this website https://github.com/PacificBiosciences/blasr/wiki/Blasr-Output-Format.

qName qLength qStart qEnd qStrand tName tLength tStart tEnd tStrand score numMatch numMismatch numIns numDel mapQV qAlignedSeq matchPattern tAlignedSeq

I write script to get the number of insertion,deletion and substitution in the alignment result from qAlignedSeq and tAlignedSeq. But I found that the number of substitution is different from the numMismatch. In addition, match pattern is not consistent with the detailed information in qAlignedSeq and tAlignedSeq sometimes.

In the example of attachment I found 2888 substitutions from dealing with qAlignedSeq and tAlignedSeq. But there are 45 numMismatch written in the line. Could you please tell me why? How do you define the substitution? Is the condition that a base is substituted by another one will be counted? And what's the meaning of '' in the matchPattern part? The sum of numMismatch,numIns and numDel is not equal to the number of '' in some lines but in somelines it equal to. So is it suitable for graphmap to align pacbio data? Or is it customized for nanopore data? Thanks very much for all your kind help in advance. 1.m5.txt