isovic / graphmap

GraphMap - A highly sensitive and accurate mapper for long, error-prone reads http://www.nature.com/ncomms/2016/160415/ncomms11307/full/ncomms11307.html Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/graphmap2
MIT License
178 stars 44 forks source link

align -x overlap produces .sam file CIGAR_MAPS_OFF_REFERENCE #66

Open miles-gene opened 7 years ago

miles-gene commented 7 years ago

Hello Using graphmap 0.5.1 I want to overlap illumina MiSeq reads derived from enrichment sequencing. The reads are enriched for a large gene family. I would hope for large numbers of relatively short contigs to be produced, each corresponding to a gene family member. I don't know if graphmap is an appropriate tool to do this but from reading the docs it seemed promising. To test I used ~200,000 reads (the reads are paired end but I'm ignoring the paired reads for now).

graphmap align -x overlap -r Blb_S11.cleaned.R1.fastq -d Blb_S11.cleaned.R1.fastq -o Blb_S11R1_gmap_test.sam

The resulting .sam file cannot be converted to .bam and sorted etc. by samtools [W::sam_parse1] mapped query cannot have zero coordinate; treated as unmapped

Picard tools ValidateSamFile identifies the following problems ERROR:CIGAR_MAPS_OFF_REFERENCE 174420 ERROR:INVALID_ALIGNMENT_START 5514 ERROR:INVALID_MAPPING_QUALITY 18417

I'm confused as it seems to me with overlapping one would expect many reads to map off reference at least one end. If I take the same reads and map them to a reference sequence (a region of one of the gene family members obtained by sequencing a PCR product) using graphmap align -x illumina. The resulting .sam alignment has reads that overhang the reference at both the start and end and samtools doesn't complain. This work was done with graphmap v0.4. I was wondering if you thought my strategy was plausible in the first place and if graphmap is an appropriate tool? Are there any changes v0.4 to 0.5.1 which might explain why samtools doesn't like overhangs? Thanks for reading Miles

robegan21 commented 7 years ago

Hi Miles,

I found similar problems with v0.5.1 where alignments were starting and/or ending off the reference. A few days ago I submitted a pull request for a fix that has been working for me: https://github.com/isovic/graphmap/pull/64

Basically it just drops the reads that graphmap was aligning with invalid coordinates on the reference, so I'm hoping that isovic can fix the underlying problem and get proper alignments for those reads in the future.

-Rob

miles-gene commented 7 years ago

Hi Thanks for the suggestion Rob. Your comment made me wonder if the version of graphmap was significant. Interestingly when I repeated the experiment using graphmap v0.4.1 the .sam file generated was converted/sorted no problem by samtools. Also, v0.4 produced a 11GB .sam file while v0.5.1 produced a 1.3GB file. Miles