mummer4 / mummer

Mummer alignment tool
Artistic License 2.0
470 stars 108 forks source link

negative strand alignment of sam output format is confusing. #143

Open baoxingsong opened 4 years ago

baoxingsong commented 4 years ago

image I used this command "nucmer -t 8 --sam-long=mumer.short.sam tair10.fa can_0.fa" to generated the result. The negative sam records are weird, looks like either they should not be outputted or the coordinate is not correct.

This problem was found in betav2 and rc1.

alekseyzimin commented 3 years ago

What exactly is wrong with the reverse strand alignments? Does the information you see in EGV correspond to the coordinates you get from show-coords?

baoxingsong commented 3 years ago

I was looking into the alignment using IGV. IGV suggested those reverse strand aligned sequences are not similar to each other.

alekseyzimin commented 3 years ago

Can you come up with a simple example to demonstrate the problem?

baoxingsong commented 3 years ago

image

Here is an IGV screenshot. For an alignment, I expect it to start with a match and end with a match. All the forward alignments, that I looked, fit this expectation. While a huge amount of reverse strand alignments start with mis-matches. I generated another alignment using command nucmer -t 10 --sam-long=mumer.edi_0.long.sam tair10.fa edi_0.fa and reformated the sam file into bam file with command. samtools view -O BAM --reference tair10.fa mumer.edi_0.long.sam | samtools sort - > mumer.edi_0.long.bam; samtools index mumer.edi_0.long.bam. If you want me to share the sam file or input files, please let me know.

alekseyzimin commented 3 years ago

Hello,

Can you share the input files, just for the sequences that produce problematic behavior? You are aligning to tair10 reference, right?

--Aleksey

On Sun, Jan 17, 2021 at 8:55 PM Baoxing Song notifications@github.com wrote:

[image: image] https://user-images.githubusercontent.com/18551962/104864065-efee1c00-5905-11eb-8f17-8da38d771d9e.png

Here is an IGV screenshot. For an alignment, I expect it to start with a match and end with a match. All the forward alignments, that I looked, fit this expectation. While a huge amount of reverse strand alignments start with mis-matches. I generated the alignment using command nucmer -t 10 --sam-long=mumer.edi_0.long.sam tair10.fa edi_0.fa and reformated the sam file into bam file with command. samtools view -O BAM --reference tair10.fa mumer.edi_0.long.sam | samtools sort - > mumer.edi_0.long.bam; samtools index mumer.edi_0.long.bam. If you want me to share the sam file or input files, please let me know.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mummer4/mummer/issues/143#issuecomment-761930197, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGPXGHNYKE4C4KLHJE2GW3DS2OIIPANCNFSM4S64NYMQ .

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 website http://ccb.jhu.edu/people/alekseyz/ blog http://masurca.blogspot.com

baoxingsong commented 3 years ago

Hi, you could find the input files and output file here: https://cornell.app.box.com/s/bl6ny2wirwub666z07eg72wmsdmvzdgx

alekseyzimin commented 3 years ago

Thank you, I will take a look.

On Tue, Jan 19, 2021 at 10:38 AM Baoxing Song notifications@github.com wrote:

Hi, you could find the input files and output file here: https://cornell.app.box.com/s/bl6ny2wirwub666z07eg72wmsdmvzdgx

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mummer4/mummer/issues/143#issuecomment-762924096, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGPXGHIK2IPHN2QNEHU5H4LS2WRRHANCNFSM4S64NYMQ .

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 website http://ccb.jhu.edu/people/alekseyz/ blog http://masurca.blogspot.com