schneebergerlab / syri

Synteny and Rearrangement Identifier
https://schneebergerlab.github.io/syri/
MIT License
337 stars 36 forks source link

DUP -duplication and INVDUP inverted duplication #102

Closed clairemerot closed 3 years ago

clairemerot commented 3 years ago

Hello, I want to make sure that I understand the output. for the regions that are annotated as "DUP", I have:

   chr_ref start_ref  end_ref ref alt chr_qry start_qry  end_qry   sv_id parent_id type copy_stat len_ref len_qry
1:   Chr01   1026643  1027140   -   -   Chr33   7446677  7447173 DUP2845 -  DUP  copygain         497     496

As they are "copygain", they should be twice in the query genome right? Yet, here it refers to only one position in the query genome. Does this mean that the 1st copy of the duplicated region is present at the same position as in the reference (e.g. Chr01) + the second position indicated for query (eg. chr33?). Same question for INVDUP. Thanks for your help! Claire

mnshgl0110 commented 3 years ago

As they are "copygain", they should be twice in the query genome right?

Yes

Does this mean that the 1st copy of the duplicated region is present at the same position as in the reference (e.g. Chr01)

Most of the times but not always. As it is possible that the first copy is inverted or translocated, SyRI will identify and report them as such.

the second position indicated for query (eg. chr33?)

Yes

Same question for INVDUP.

Same as DUP

If you want to check the original loci of DUP regions in details, that you can try using the regAnno script (in the bin folder) that I wrote for personal testing. You can use it to fetch the annotations of a genomic region. However, it only reads the intermediate file (synOut.txt, invOut.txt etc) with exactly these names. Run it in your working directory:

$ regAnno region Chr01 215099 218756
synOut.txt  Chr01   1   215102  Chr01   1   215102
TLOut.txt   Chr01   215099  218756  Chr01   39075877    39079534
synOut.txt  Chr01   218750  307911  Chr01   218750  307911
clairemerot commented 3 years ago

Thanks a lot for your answer, I'll try to figure out.

clairemerot commented 3 years ago

I'm sorry, I have been trying to figure this out but I don't get it. If I want to build an informed vcf with reference sequence and alternative(query) ref for those complex rearrangements (DUP, INVDUP, TRANSINV, etc). Is this possible? Or would you recommend against it? Or should I do it but without the rearrangements that fall on two different chromosomes? I am really sorry to bother you with this, and I really appreciated the tool plus the follow-up on github. Thanks! Claire

mnshgl0110 commented 3 years ago

SyRI should have generated a VCF output file that contains all structural rearrangements. Or do you need something different?

On Wed 20 Oct, 2021, 7:59 PM Claire Mérot, @.***> wrote:

I'm sorry, I have been trying to figure this out but I don't get it. If I want to build an informed vcf with reference sequence and alternative(query) ref for those complex rearrangements (DUP, INVDUP, TRANSINV, etc). Is this possible? Or would you recommend against it? Or should I do it but without the rearrangements that fall on two different chromosomes? I am really sorry to bother you with this, and I really appreciated the tool plus the follow-up on github. Thanks! Claire

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/schneebergerlab/syri/issues/102#issuecomment-947906171, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD3ZK32Q7TF7OQJZWPFSVB3UH37OPANCNFSM5GJWZEMQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

clairemerot commented 3 years ago

Yes, the vcf is fine, but it includes REF/ALt sequences only for small SV. I was trying to get sequences also for the larger rearrangements like INV, DUP, etc. Thanks a lot

mnshgl0110 commented 3 years ago

I see, thanks for clarifying. I don't think there is a direct answer to this question and it would rather depend on your objectives. As far as I understand VCF format, I think you can add inversions as Alt Sequence and translocations as Breakpoints but I am not sure how duplications would work. However, in my opinion, structural rearrangements (SRs) (INV, TRANS, DUP etc) are sequences that have "same" sequence but at different location or with different orientation. So, an ideal SR (without any nested small SV) would actually have exactly the same sequence. Therefore, there would be nothing like Ref vs Alt sequence. Practically, I would guess that adding this sequence might make the VCF file unnecessarily large without adding much more information. I hope this helps.