mahulchak / svmu

A program to call variants from genome alignment
GNU General Public License v3.0
75 stars 18 forks source link

confused about the results: sv.txt and cnv.txt #25

Open QianghuiZhu opened 1 year ago

QianghuiZhu commented 1 year ago

Hi! SVMU is a pretty good tools to detect SVs.

But I am be confused about the result files, the sv.xxx.txt and cnv.xxx.txt.

In my understanding, CNVs are some types of SVs (simple SV type: deletion (DEL), duplication (DUP), inversion (INV), insertion (INS), translocation (TRA). While CNV are unbalanced variants, including: DEL, and DUP, sometimes also including INS).

But in the result files, SVs and CNVs are divided into two files. While some interval are overlapped, there are also some interval unique in cnv.xxx.txt or sv.xxx.txt. So, while considered for downstream analysis, may I merge these two files together to get more SV sites (merge overlapped interval)?

Thanks for a lot!

mahulchak commented 1 year ago

The SV.xx.txt file has a comprehensive list of SVs, but some CNVs, especially in highly repetitive sequences can be missed and not present in this file. It's probably okay to combine CNVs from the two files, but do check a few unique CNVs from the cnv file to make sure they are true CNVs. I hope this is helpful. Let me know if you have any other questions.

On Thu, Nov 3, 2022 at 7:19 AM Hui @.***> wrote:

Hi! SVMU is a pretty good tools to detect SVs.

But I am be confused about the result files, the sv.xxx.txt and cnv.xxx.txt.

In my understanding, CNVs are some types of SVs (simple SV type: deletion (DEL), duplication (DUP), inversion (INV), insertion (INS), translocation (TRA). While CNV are unbalanced variants, including: DEL, and DUP, sometimes also including INS).

But in the result files, SVs and CNVs are divided into two files. While some interval are overlapped, there are also some interval unique in cnv.xxx.txt or sv.xxx.txt. So, while considered for downstream analysis, may I merge these two files together to get more SV sites (merge overlapped interval)?

Thanks for a lot!

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/mahulchak/svmu/issues/25__;!!CzAuKJ42GuquVTTmVmPViYEvSg!IVlbo0hGkh6BSrneOoIUX6bLEMyp_5D7Fj16K_dAc46Zeoj3LsM1Ki_sHbi_XUOB0QmLPmjJSzPz1p0aFeoBRve_$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABZQH2B4QXZ6WVBIW2H7ZE3WGOUVRANCNFSM6AAAAAARWCNDXQ__;!!CzAuKJ42GuquVTTmVmPViYEvSg!IVlbo0hGkh6BSrneOoIUX6bLEMyp_5D7Fj16K_dAc46Zeoj3LsM1Ki_sHbi_XUOB0QmLPmjJSzPz1p0aFZp_oRUO$ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Mahul Chakraborty Assistant Professor Department of Biology Texas A&M University Phone: 949 824 9559 Fax: 979-845-2891 Website: https://mahulchakraborty.wordpress.com/ Github: https://github.com/mahulchak

QianghuiZhu commented 1 year ago

WIth great thanks to you!

I'll try to merge these two files and filter out some overlapped sites.

QianghuiZhu commented 1 year ago

Hi! I'm sorry to bother you, but I have a small problem about the result files are whether 0-based or 1-based coordinations? Thank you!

mahulchak commented 1 year ago

To the best of my knowledge, they're 1-based.

On Sun, Jan 8, 2023 at 2:26 AM Hui @.***> wrote:

Hi! I'm sorry to bother you, but I have a small problem about the result files are whether 0-based or 1-based coordinations? Thank you!

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/mahulchak/svmu/issues/25*issuecomment-1374756852__;Iw!!CzAuKJ42GuquVTTmVmPViYEvSg!IksNPAw-CmPiwET9zYTug_voKMh7AmnQOgXyM7Dx5lUVtZGyF9aAVoqv_i61Cqo6h88OsJn-3shRUlKdudcGdqJe$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABZQH2ETRS7UVSYBUI5M6ADWRJ22RANCNFSM6AAAAAARWCNDXQ__;!!CzAuKJ42GuquVTTmVmPViYEvSg!IksNPAw-CmPiwET9zYTug_voKMh7AmnQOgXyM7Dx5lUVtZGyF9aAVoqv_i61Cqo6h88OsJn-3shRUlKduVRli_O4$ . You are receiving this because you commented.Message ID: @.***>

-- Mahul Chakraborty Assistant Professor Department of Biology Texas A&M University Phone: 949 824 9559 Fax: 979-845-2891 Website: https://mahulchakraborty.wordpress.com/ Github: https://github.com/mahulchak

QianghuiZhu commented 1 year ago

Thank you again for your reply.

Maxim-Karpov commented 1 year ago

Hi, what do the columns in the coords and cnv files represent? The files were produced without headers.

QianghuiZhu commented 1 year ago

I do not get header INFO, either. In my thought, they may: ref_chr ref_start ref_end query_chr query_start query_end ref_copy query_copy. I do not know about the last two columns. while I only use the coordinates.

Maxim-Karpov commented 1 year ago

I do not get header INFO, either. In my thought, they may: ref_chr ref_start ref_end query_chr query_start query_end ref_copy query_copy. I do not know about the last two columns. while I only use the coordinates.

Thank you! I hope the developer can chime in on this as well.