schneebergerlab / syri

Synteny and Rearrangement Identifier
https://schneebergerlab.github.io/syri/
MIT License
323 stars 35 forks source link

The "TRANS" in syri.out have overlap #76

Closed jigaoxiang closed 3 years ago

jigaoxiang commented 3 years ago

Dear pro: Sorry to bother you. As the follow showing, when I selected "TRANS" meaning translocation in syri.out, there is a overlap which actually confused me. Looking forward to your reply. Thanks!

C1 17976077 18027842 - - C1 16813190 16869297 TRANS10269 - TRANS - C1 18013990 18031887 - - C1 16855451 16870570 TRANS10270 - TRANS -

mnshgl0110 commented 3 years ago

Hi, SyRI does allow some overlap between translocations. But here it seems that the overlap is quite a lot. Are you using the latest version of SyRI? In the last release version I added the --tdmaxolp parameter which limits the overlap between translocations. Please try using it as that should solve this issue.

jigaoxiang commented 3 years ago

Dear pro, Thank your reply! I tried the last release version and added the --tdmaxolp 0.1 parameter aiming to get no overlap of traslocation. Very lucky, I did it. But the number of "INV" in sryi.out have different with before. When I added --tdmaxolp there is more INV than not added this parameter. Does add this parameter affect other structure varation identification?

add --tdmaxolp: (the number of INV is 183) C1 5315128 5483618 - - C1 5260941 5290652 INV9962 - INV - C1 6697125 6760334 - - C1 6425013 6488465 INV9963 - INV - C1 7330317 7485642 - - C1 6836251 6985273 INV9964 - INV - ... not add --tdmaxolp (the number of INV is 99) C1 5313097 5320337 - - C1 5285722 5292423 INV10152 - INV - C1 7330317 7482922 - - C1 6838960 6985273 INV10153 - INV - C1 17064682 17326890 - - C1 15952665 16266243 INV10154 - INV - ...

mnshgl0110 commented 3 years ago

Hi Jigao, This is not the expected behavior. --tdmaxolp should not be affecting the number of inversions. I also just tested this parameter and it did not affect the number of inversions. Are you testing the output from different versions of SyRI? Because there could be differences there.

jigaoxiang commented 3 years ago

Dear pro: thank you for your patient ! Just as you said, I compared output from two version of SyRI. And try to a last version 1.4 no --tdmaxlp parameter, there is no different about inversions. And other question, as the follow showing about "TRANS", when I add --tdmaxolp 0.1, the overlap region was filtered (C1: 28408237-28527389) compared with no --tdmaxolp parameter. Does the value 0.1 is too strict? any suggestion about this value can keep this translocation info but no overlap?

add --tdmaxolp 0.1: C1 26150070 26154007 - - C1 7839370 7843316 TRANS10174 - TRANS - C1 29727382 29728530 - - C1 15548786 15549942 TRANS10175 - TRANS - C1 29928391 29929551 - - C1 27568525 27569687 TRANS10176 - TRANS -

not add --tdmaxolp C1 26150070 26154007 - - C1 7839370 7843316 TRANS10195 - TRANS - C1 28408237 28514158 - - C1 26214924 26307607 TRANS10196 - TRANS - C1 28484553 28527389 - - C1 26278010 26321181 TRANS10197 - TRANS - C1 29727382 29728530 - - C1 15548786 15549942 TRANS10198 - TRANS -

mnshgl0110 commented 3 years ago

It is difficult for me to say what would be the suitable overlap threshold for your analysis. You can test different values. However, in this case, I would expect one of

C1 28408237 28514158 - - C1 26214924 26307607 TRANS10196 - TRANS -
C1 28484553 28527389 - - C1 26278010 26321181 TRANS10197 - TRANS -

to be present in the output. As overlaps are considered in both the genomes, it could be the case that there are more translocations in this region in the query genome. Maybe you can try sorting it based on the query genome coordinates.

jigaoxiang commented 3 years ago

Ok! Thank you for your suggestion! best wish to you and your family!