waveygang / wfmash

base-accurate DNA sequence alignments using WFA and mashmap3
MIT License
174 stars 18 forks source link

Force global (a.k.a. end-to-end) alignment #203

Open JingaJenga opened 11 months ago

JingaJenga commented 11 months ago

Hello!

I'm using wfmash to align together the two assembled haplotype sequences of a genomic region. I get an alignment that is good but does not go all the way to the ends of the two sequences. This is especially strange because I know the sequences match at their ends (42kbp perfect match.) Is there any way I can force wfmash to align end-to-end?

Cheers, -- Josh

ekg commented 11 months ago

-N may do what you want.

If not you can force the alignment by constructing a full length mapping in PAF format and feeding this into wfmash as input with the -i flag.

On Fri, Nov 3, 2023, 18:56 Josh Burton @.***> wrote:

Hello!

I'm using wfmash to align together the two assembled haplotype sequences of a genomic region. I get an alignment that is good but does not go all the way to the ends of the two sequences. This is especially strange because I know the sequences match at their ends (42kbp perfect match.) Is there any way I can force wfmash to align end-to-end?

Cheers, -- Josh

— Reply to this email directly, view it on GitHub https://github.com/waveygang/wfmash/issues/203, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEJGSISERHHFJEGPMCDYCWADPAVCNFSM6AAAAAA65DXHCCVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3TOMBWGA3TKNI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ekg commented 11 months ago

If you can share the test case we might be able to figure out what's going wrong. It could help us a lot if there is a bug causing this.

On Fri, Nov 3, 2023, 22:27 Erik Garrison @.***> wrote:

-N may do what you want.

If not you can force the alignment by constructing a full length mapping in PAF format and feeding this into wfmash as input with the -i flag.

On Fri, Nov 3, 2023, 18:56 Josh Burton @.***> wrote:

Hello!

I'm using wfmash to align together the two assembled haplotype sequences of a genomic region. I get an alignment that is good but does not go all the way to the ends of the two sequences. This is especially strange because I know the sequences match at their ends (42kbp perfect match.) Is there any way I can force wfmash to align end-to-end?

Cheers, -- Josh

— Reply to this email directly, view it on GitHub https://github.com/waveygang/wfmash/issues/203, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEJGSISERHHFJEGPMCDYCWADPAVCNFSM6AAAAAA65DXHCCVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3TOMBWGA3TKNI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

JingaJenga commented 11 months ago

Thanks for your quick response!

I discovered the -N flag already and it sorta helps, but not fully. Without it I get two separate alignments, which don't cover the whole sequence even combined. With it the alignments are merged.

How would I construct a full length mapping in PAF format? Would I run wfmash once with --approx-map, then manually edit the paf file's CIGAR string before running wfmash again with -i?

JingaJenga commented 11 months ago

Here's an example of a failure. See the attached files (you can remove the .txt extension; I added it so github would upload the files.) The fasta has 2 sequences, which are ~800kbp each and which are identical for the first 42 kbp. When I align them as follows:

samtools faidx wfmash_fail.fasta wfmash -N -s 5000 -l 25000 -p 90 -n 1 -k 19 -H 0.001 -X -t 32 wfmash_fail.fasta

I get the attached PAF file, in which the alignment starts at 1408 on both sequences. It seems like, even if wfmash were not explicitly aiming for an end-to-end-match, it should extend the alignment to the start of the sequences.

Cheers, -- Josh

wfmash_fail.fasta.txt wfmash_fail.paf.txt

JingaJenga commented 11 months ago

By the way I'm using wfmash v0.10.3-3-g8ba3c53 if that helps.

ekg commented 11 months ago

Could you try default parameters on current master and also -p 70?

On Sat, Nov 4, 2023, 12:18 Josh Burton @.***> wrote:

By the way I'm using wfmash v0.10.3-3-g8ba3c53 if that helps.

— Reply to this email directly, view it on GitHub https://github.com/waveygang/wfmash/issues/203#issuecomment-1793500028, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEKOES5P2QNDJDFRET3YCZ2FLAVCNFSM6AAAAAA65DXHCCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJTGUYDAMBSHA . You are receiving this because you commented.Message ID: @.***>

JingaJenga commented 11 months ago

Setting -p 70 worked, thank you! It looks like the lower I set -p, the earlier the alignment starts:

-p 90 -> 1408 -p 80 -> 256 -p 70 -> 0

This is great, but I don't understand how it could follow logically - is this just my lack of understanding of the mashmap algorithm?

JingaJenga commented 11 months ago

Returning to my original question - how can I guarantee that the alignment will be end-to-end? It seems like reducing -p 70 (or even low as 50) fails to produce end-to-end alignments in some other situations.

biomonika commented 6 months ago

I have the same question -- would love to have an option to force end-to-end alignments.

ekg commented 6 months ago

Wfmash -i will do global alignment of the range pairs it takes on input.

On Sat, Mar 30, 2024, 13:00 Monika Cechova @.***> wrote:

I have the same question -- would love to have an option to force end-to-end alignments.

— Reply to this email directly, view it on GitHub https://github.com/waveygang/wfmash/issues/203#issuecomment-2028390473, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQELRI5DXWK47JTGZIKDY234U3AVCNFSM6AAAAAA65DXHCCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRYGM4TANBXGM . You are receiving this because you commented.Message ID: @.***>