schneebergerlab / syri

Synteny and Rearrangement Identifier
https://schneebergerlab.github.io/syri/
MIT License
303 stars 36 forks source link

Request for Guidance on Converting NOTALs to Presence in Syri Outputs #246

Open Lancer-sudo-png opened 2 months ago

Lancer-sudo-png commented 2 months ago

Dear @mnshgl0110, I hope this message finds you well. I am reaching out to you because I am currently utilizing Syri, which I find to be an exceptionally useful tool for calling structural variants (SVs). However, I have encountered some challenges with presence/absence variations (PAVs) that I hope you can help clarify.

After reviewing the discussions in the Issues section #107 , I concur with the perspective that NOTALs could be considered as PAVs. My specific question concerns the process of converting a NOTAL, found in the query sequence, into a "Presence" status. In the Syri output file (syri.out), the locations are noted only for the query genome, with no corresponding descriptions for the reference genome,Like this:

NOTAL-query pic

Could you please advise on how to appropriately define a NOTAL-query as "Presence" in a VCF file format? Any guidance or suggestions you could provide would be greatly appreciated.

Thank you for your time and assistance.

Best regards, Baoyue

mnshgl0110 commented 2 months ago

Consider the following example, where the syntenic regions (between and query genomes) are in blue, translocation in red, and the Notal region is shown as green circle.

image

Now, compared to the reference, the notal could correspond to either of the two locations marked with arrows. As there is no obvious answer as to which position "mutated" to generate the notal, we do not assign a reference coordinate to notals in query.

But, I guess for your task, it might be ok to assign one of the two values (as long as it is clearly described). You will need to fetch the neighboring blocks in the query genome and then get the corresponding breakpoints in the reference.

Lancer-sudo-png commented 2 months ago

Thank you for your response! @mnshgl0110

Indeed, I have identified a pattern based on adjacent annotation blocks:

subtracting 1 from the left breakpoint of a NOTAL-query yields the right breakpoint of the adjacent block on the left. Using the right breakpoint of the adjacent block, I can locate its corresponding segment on the reference genome. Similarly, adding 1 to the right breakpoint of a NOTAL-query yields the left breakpoint of the adjacent block on the right, and I can locate its corresponding segment on the reference genome using this left breakpoint.

However, I found that the segments on the reference genome identified using the adjacent block breakpoints may be located on different chromosomes. In such cases, how should I describe this NOTAL-query (i.e., what we consider as Presence) using the coordinates of the reference genome? Here is my variant information and a schematic diagram:

image

NOTAL-query: - - - - - Chr1 36665 37065 NOTAL27497 - NOTAL - The left block of the NOTAL-query: Chr1 20046 29279 - - Chr1 27494 36664 SYN3 - SYN - The right block of the NOTAL-query: Chr3 4532378 4532617 - - Chr1 37066 37306 INVDP47167 - INVDP copygain

mnshgl0110 commented 2 months ago

Indeed, that is the complication in describing Notals using reference genomes as sometimes there is no obvious answer. I guess, you can pick one convention for your analysis and then use that. Unfortunately, I do not have a more helpful answer.