simroux / VirSorter

Source code of the VirSorter tool, also available as an App on CyVerse/iVirus (https://de.iplantcollaborative.org/de/)
GNU General Public License v2.0
104 stars 30 forks source link

Identification of viral like genes #93

Open 473021677 opened 1 month ago

473021677 commented 1 month ago

Hi Simon, I am trying to identify viral auxiliary metabolic genes (AMGs) using VirSorter v2.2.4 and DRAM v1.2.0 (viral mode; --skip_trnascan). The “viral-affi-contigs-for-dramv.tab” file has been generated by VirSorter v2.2.4 (--prep-for-dramv), and has been used to screen putative viral AMGs by DRAM v1.2.0. However, I have one doubt about the “viral-affi-contigs-for-dramv.tab” file. As described in the "Result files" section in the "https://github.com/simroux/VirSorter#result-files" website, pipe("|")-delimited table listing the annotation of all predicted ORFs in all contigs and the categories of virus clusters represent the range of genomes in which this virus cluster was detected, i.e. 0: hallmark genes found in Caudovirales and 3: hallmark gene not found in Caudovirales. Therefore, 0 and 3 represent the viral hallmark genes. But I could not know which genes are viral like genes, which are also be used by DRAM-v to determine the auxiliary scores and flag assignments of the viral AMGs. I guess that 1 and 2 in the “viral-affi-contigs-for-dramv.tab” file may represent the viral like genes. I want to determine the viral hallmark genes and viral like genes to draw the viral genome organization diagrams. I need your help. Thanks.

Best regards, Yang Yuan

simroux commented 1 month ago

Hi Yang, Yes, viral-like genes are categories 1 and 2, with 4 and 5 for genes found in non-Caudovirales (but all of these are not hallmark, i.e. they are often also found in bacteria/archaea genomes).

Best, Simon

473021677 commented 1 month ago

Hi Simon, Thanks for your explanation. But as described in the "DRAM-v in detail" section in the "https://github.com/WrightonLabCSU/DRAM/wiki/1.-How-DRAM-Works#dram-v-in-detail" website, a viral-like gene should be a VIRSorter protein cluster with category 1 or 4. I am not sure whether the genes with the category 1 or 4 in the “viral-affi-contigs-for-dramv.tab” file could be regarded as the viral-like genes. I have appended the "viral-affi-contigs-for-dramv_practice.tab.txt" file. If the genes with the category 1 or 4 are the viral-like genes, the "DPZ5_202101_spades_4293__22" and "DPZ5_202101_spades_4293__25" in the "viral-affi-contigs-for-dramv_practice.tab.txt" file should be the viral-like genes. I need your help. Thanks. viral-affi-contigs-for-dramv_practice.tab.txt

Best, Yang Yuan

simroux commented 1 month ago

I am not sure I understand the question, but you may want to double-check the scores as well ? DPZ5_202101_spades_4293__22 and DPZ5_202101_spades_4293__25 have a hit to a profile of category 1, however the score of these hits is pretty low (~ 30), so if you don't see them as viral-like genes in DRAM-V, that's likely because of this low score.

473021677 commented 1 month ago

Hi Simon, I am sorry that I have not made my question understood by you. I want to identify the viral like genes based on the categories of the VIRSorter protein clusters, not the auxiliary score. As described in the "Result files" section in the "https://github.com/simroux/VirSorter#result-files" website, the categories of virus clusters in the “viral-affi-contigs-for-dramv.tab” file (--prep-for-dramv for VirSorter v2.2.4) represent the range of genomes in which this virus cluster was detected, i.e. 0: hallmark genes found in Caudovirales, 1: non-hallmark gene found in Caudovirales, 2: non-hallmarke gene found exclusively in virome(s), 3: hallmark gene not found in Caudovirales, 4: non-hallmark gene not found in Caudovirales. So there are five categories for the VIRSorter protein clusters: 0, 1, 2, 3, 4. The 0 and 3 categories represent the viral hallmark gene. And in the "DRAM-v in detail" section in the "https://github.com/WrightonLabCSU/DRAM/wiki/1.-How-DRAM-Works#dram-v-in-detail" website, it was mentioned that a viral-like gene should be a VIRSorter protein cluster with category 1 (non-hallmark gene found in Caudovirales) or 4 (non-hallmark gene not found in Caudovirales), which is what I am trying to confirm. Thanks.

Best regards, Yang Yuan

simroux commented 1 month ago

Hi, If the question is: "what category are viral-like genes ?", then yes these are categories 1 and 4. Best, Simon

473021677 commented 1 month ago

Hi Simon, Thanks for your explanation. My question has been solved.

Best regards, Yang Yuan