wjwei-handsome / wgatools

Whole Genome Alignment Tools
MIT License
105 stars 7 forks source link

paf2maf fail with FastGA and wfmash #15

Open baozg opened 1 week ago

baozg commented 1 week ago

Hi Wenjie,

I have meet some problem when using wgatools paf2maf. It could worked with minimap2 and anchorwave PAF output, but it failed with FastGA and wfmash alignment now. I checked the alignment, it looks fine to me. Could you help me find what's the problem with these alignments?

https://keeper.mpdl.mpg.de/d/4b78e4b87c0449d3b821/ Col-CC and Ler-0 would be target and query, and wgatools folder for FastGA and wfmash alignments

wjwei-handsome commented 1 week ago

Hi @baozg ,

I checked the wfmash data, the problem is:

query_end  != query_start + Match/Mismatch + INS_size

And the target position looks correct.

Based on this theorem:

query_start + Match/Mismatch + INS_size = query_end 
ref_start + Match/Mismatch + DEL_size = ref_end

I checked the output of the old-version(v0.12.1-5-gd6532bc) wfmash, it looks great.

This also inspired me, I will develop a validation paf command, perhaps can also repair the WRONG paf.

As for FastGA, It seems to reverse the order of query and target in paf file😵‍💫. Because if I try to swap the fasta files of target and query, everything works fine. It may be that the target/query order of FastGA's output is reversed, or your input does not meet FastGA's expectations :)

I hope this is helpful to you. Please keep in touch if you have any questions later!

Best regards, Wenjie

wjwei-handsome commented 1 week ago
> sed -n '64p' Col-CC_Ler-0_MPIBT.wfmash_21.paf
Chr1    32485061    3030000 3079896 +   Chr1    32637894    3030124 3080307 49819   50214   21  gi:f:0.996241   bi:f:0.992134   md:f:0.996967   cg:Z:[.....]
> math 3030000+49819+150+31 // q_start=3030000 match_size=49819 mismatch_size=150 ins_size=31
3080000 // The correct query end is 3080000
> sed -n '64p' Col-CC_Ler-0_MPIBT.wfmash_21.paf|sed 's/3079896/3080000/g'|wgatools p2m -g Col-CC.chr.fa.gz -q Ler-0.fa.gz -o test.maf -r // everything is OK
wjwei-handsome commented 6 days ago

Reopen it for reminder myself to develop validate sub-cmd 🤖

wjwei-handsome commented 5 days ago

Whoo hoo

validate sub-cmd done in https://github.com/wjwei-handsome/wgatools/commit/53c57fa78818b305a78acade66a7fa07b542a7b3