schneebergerlab / syri

Synteny and Rearrangement Identifier
https://schneebergerlab.github.io/syri/
MIT License
321 stars 35 forks source link

minimap2-2.20: inversions disappear! #91

Open ricardo-aaron opened 3 years ago

ricardo-aaron commented 3 years ago

Hi, I just switched minimap2 from 2.17 to 2.20, and now my syri plots (from same genome inputs) lose inversions, translocations and duplications. They also have more gaps, some of them where inversions used to be. What's going on? Should I go back to minimap 2.17?

mnshgl0110 commented 3 years ago

Hi Ricardo.

It seems that some significant changes have been made in minimap2 in the last couple of months. Release notes of version 2.19 states "Improvement: more contiguous alignment through long INDELs.". This is further elaborated in the README of unimap which says "With the default asm5 preset, unimap may align a highly diverged region as a long insertions followed by a long deletion.".

I suspect that these changes are resulting in the latest version of minimap2 becoming less sensitive towards genomic rearrangements. So, instead of having separate alignments for the rearranged regions, it is considering them as large indels in otherwise syntenic regions. This could easily cause the changes that you observe. But, I would have to test this in more detail to know exactly what is happening.

You can test with asm20 preset of minimap2, in case that is more sensitive towards genomic rearrangements. However, I guess, going back to 2.17 would be the easiest solution for you.

If possible, it would be great if you could share your alignments with minimap2 2.17 and 2.20 as that would help in understanding what exactly is happening.

Manish

ricardo-aaron commented 3 years ago

Here are the alignments: minimap2-2.17: https://data.cyverse.org/dav-anon/iplant/home/rchavez/minimaps/gobar526_vs_gohir527.log https://data.cyverse.org/dav-anon/iplant/home/rchavez/minimaps/gobar526_vs_gohir527.sam minimap2-2.20: https://data.cyverse.org/dav-anon/iplant/home/rchavez/minimaps/gobar526_vs_gohir527v2.log https://data.cyverse.org/dav-anon/iplant/home/rchavez/minimaps/gobar526_vs_gohir527v2.sam

RNieuwenhuis commented 2 years ago

minimap2 just had a new release supposedly addressing these issues. https://github.com/lh3/minimap2/releases/tag/v2.23

mnshgl0110 commented 2 years ago

@RNieuwenhuis Thanks for sharing. Yes, this release should fix the issue with inversions, but there are still possible issues with large gaps and missing translocations/duplications (https://github.com/lh3/minimap2/issues/830)

Have you checked whether the new release aligns translocations/duplications correctly?

Chenglin20170390 commented 2 years ago

Same question, I found different alignment for inversion breakpoint between minimap 2.17 and 2.24 with same command minimap2 -x asm5 --secondary=no. I noticed that you put it in your github repo Issues in inversion detection: The recent releases of minimap2 (2.18-2.22) have some bug which results in inverted regions not getting aligned correctly. This issue is mostly fixed in the current master branch (HEAD node) of minimap2 repository. So, for accurate structural rearrangement identification with SyRI please use the latest version of minimap2.

So, We would like to find the more accurate inversion breakpoint by minimap2 and syri, what do you think about the version of minimap2.24 and 2.17 for better inversion breakpoint detection ?

image
mnshgl0110 commented 2 years ago

The primary difference, that I could figure out, between the two minimap2 versions is how they handle large gaps. Newer versions allow longer gaps in the alignments. Consequently, there is one large inverted alignment from 68269-114037 with minimap2.24 compared to two alignments with minimap2.17.

In this example, syri would identify same inversion block (though I think the region is actually a TD), having the same breakpoints. The difference would be in the number of inverted alignments (INVAL) in the inversion block and how indels/SVs are estimated within this inversion.

I would guess that if the primary focus is on the inversion breakpoints, then v2.24 could be better, however, if the genomic variation inside the structure rearrangements are also of interest then maybe you can try v2.17 or set lower values of -r parameter with v2.24 (example: -r 1k,10k).

Chenglin20170390 commented 2 years ago

Thanks for your advice.