Open susutBu opened 4 years ago
Right, filtering is recommended. The current default cutoff (0.5) is set based on simulated virome data from known genomes. With real metaG data that have more unknown sequences, the default cutoff tends to have high sensitivity with a trade off of more false positives. Shorter sequences also tends to have higher false positive rate. Higher cellular sequence proportion can also increase the false positive rate. So there is no cutoff that can fit all cases and that's why we leave it to the users.
From my experience so far, I recommend >0.75. There are a few options for filtering:
final-viral-score.tsv
table.--min-score
option--hallmark-required-on-short
option requires sequence shorter than 5K to have hallmark genefinal-viral-combined.fa
for highest confidenceThank you for your detailed explanation. This information is very useful.
Right, filtering is recommended. The current default cutoff (0.5) is set based on simulated virome data from known genomes. With real metaG data that have more unknown sequences, the default cutoff tends to have high sensitivity with a trade off of more false positives. Shorter sequences also tends to have higher false positive rate. Higher cellular sequence proportion can also increase the false positive rate. So there is no cutoff that can fit all cases and that's why we leave it to the users.
From my experience so far, I recommend >0.75. There are a few options for filtering:
- Filter score based on the
final-viral-score.tsv
table.--min-score
option--hallmark-required-on-short
option requires sequence shorter than 5K to have hallmark gene- Only pick sequences with hallmark genes using hallmark filed in fasta header in
final-viral-combined.fa
for highest confidence
I have a question about final-viral-score.tsv
:Some fasta header looked similar with highest confidence in final-viral-combined.fa
, but did not appear in the final-viral-score.tsv
at the same time.
For example:
in the final-viral-score.tsv
:
>NODE_224_length_4468_cov_41.505099||full shape:circular||start:97||end:2840||group:NCLDV||score:0.94||hallmark:0
not in final-viral-score.tsv
:
>NODE_255_length_3849_cov_51.199262||full shape:circular||start:3223||end:3600||group:dsDNAphage||score:0.887||hallmark:0
I don't know what standard of this file for filtering . Thanks!
@hjdong, Do you mean these two sequences are both in final-viral-combined.fa
, but you can not find the second one in final-viral-score.tsv
? If so, could you paste the sequence of >NODE_255_length_3849_cov_51.199262||full
here? I need to look into it. Thanks.
@hjdong, Do you mean these two sequences are both in
final-viral-combined.fa
, but you can not find the second one infinal-viral-score.tsv
? If so, could you paste the sequence of>NODE_255_length_3849_cov_51.199262||full
here? I need to look into it. Thanks.
Yes,two sequences name:>SRR9161490_NODE_224_length_4468_cov_41.505099||full
and >SRR9161490_NODE_255_length_3849_cov_51.199262||full
:
>SRR9161490_NODE_216_length_4595_cov_145.989207||full shape:circular||start:562||end:4021||group:NCLDV||score:1.0||hallmark:0
TGAATAATGTCTCATATGAGGCAATAAGACAACAAGTAAAAAGATACGAAGATGAACTTAACGGACATATTATTAAGCAAAATAGAACACAATTCTTAGATGATGTAGCAGTAGATATTTTAGATCAACACAGGAAAGAAAATCCTGTTGTAATTATCAATAAAGACACTGATTCTAGGTTAAAACAATTAGAAGATGAGAACAAAAATCTGTTAATTAAAGTTGCACAACAGGCAGATAAGATATCTCAATTGAATGAAGATTTAAAAAATAAAATAGAACAAATGACTTCATTATTGTTAGAAAATAATGAGAAAACACTTCTGCTAGAGCAAAAAAAAGACCAGGCAGAAGAAATAAATCAATTAAAAGAACAACTGGATCGAGAGAAAAAAGAACAGTTAAATGAGATAGAAATGTTAAAAAATGAATTAGAAAAAGAAAAAAATAAAGGTTTCTTTGCACGTTTATTTGGAAAATAAACTGACTTTATTACACACAATCAGTGTTTAAATTGAACACTGATTTTTAATTTTTCGATACATAAAAATGACAAAAATAAAAATTTTAAGCACCCATTTTTGAAAAAATTGAGCACTTAATAAATGCAAAAATGATTTTTTAGATCAACAATTTTTAATTAAAAATGGATTTTAAAATTGAACTTTTGACACAAAAAAATGGCAAAAACAAAAATTTTAAGCACTCATTTTTGAAAAAATTAGCATTTAATAAATGCAAAAATAATTTTTTTAGATCAACATTTTTAATTTAAAAATGCACTTTAAAAATGAAATTTTGACACATTAAAATGACAAAAATAAAAATTTTAAGCACCCATTTTTGAAAAATTGAGCACTAAATAAATGAAAAAATAATTTTTTAGATCAATATTTTTAATTTAAAAATGCACTTTAAAAATGAATTTTAGACACAATAAAAAAACGCAAAATTAGTAAAAATGCCATTTTCTTCAAAAACGGCGATTTTAGGTCATTTTTGGGGTTTTAAAAATAGCAAGTAAGGACTTTGTGCATACACTGGAATCCCCTAAGGACTTTGTGCATACAAAGTCCTTGCGATACCTGTGCATACAAAGTCAAAAATGTGTTGTACACACATTTTATTTTTTGAGGATTTCATCCCCAAACCCCTTCGTAAAATCAATGTCTACAGTTGTAGCAAGCAACGTCTTTGGATGTCCAAAGACAGACAATGTACACTTCAACAGTGATAAAAAAACTGTCCTTTTAGAGATACGGTTTGAAGAGAGTATAGCAATGCGTTGGGTTTTTAAGATACTAGATTATGATACAATCTTTTTGTTGCCTAGGCTCACCAACGCGAAGGAGGTGAGAACGTGGAATTTGTTTGCAACTTTATAGTTTCTGTCGGAGCAAGTGTGGTCGCCTACTACATTTGCAAGTGGCTTGACGGAAATGACTAGGCAGCAAAAGCACAAACTGAGTGGATTGACTACCCACTCTTTTTTTATGTAAAAAAAAGAAGAACCGAGTGACTAGTTCAGTTCTTCTTAACGTGGCTTTTGTTTGCAACTTTGCCTACTTGAATTATATCATGAACTTTTCAAAAGTAAACGTCCGAGACTTAGTATGCGCTAAGTACGAGGACACGTACTTATTGACTACCTTTAAATCGTATCATGTATTTTAGATCATTAGTAGACTGTGCTTTTAAATTTATACTTGTTAAAATCATTGTATAAAGAACACCTAGAGAATGAAAAAGGAGCTTTGATACTAATGGACGGTATAGTGGCTTTTATAGTTGGGACACTTATTGGAATAGCTATTCATAGCTACATAAAAAAAAGAAAGAAAAAGGAATAGGCCTAATGTCATTAAGTTAAGGACATTATATTGAGGCACCTATATTTTCATACAGGTGTCCTCTTTTTTACTCTAGCATTGTACAGAGAAGTATTTACCATAGTGTGCTAGAGTAAAGCAGTGACTGGACACCGATAAAAATCGGTGTTATATTTTTTAGTGAAATAGGGTCAATAACTTCATTGATAATATAACGTGACCTAGGGGCCTCTTGGCCTTTGATTCTTTTTTGTCCTTTTATGTTTGTTTTAGAGAGGTGTTCATAACAATGTTAAGTGATGTTATTAAGAAGTATGGTGGAGAGAAGATGGATCCACTGGATGTATATAAAGATATTTTTAGAATTGGAGAAGGATTCATTCAGAAGGAGTATGAAGATAGTGGAAGTTTCAAAGCAAATCCAATTGCCTATTATAAGAATGAGAACGAAGATCATGGCCATTTCAGAATTATGTTTGAAGATAAGTTTGAAGAAATCTATCGAAATGAGCTTGTGAATGCAGATTTCTGTGTAATGAATGGTTTAACTTACTTTGGTGCAAAATATACATCGGATAGAGCTTCTAAAATGTGTGCATTGATATTTGATATTGATGGTGTTACAGATAATAGTTTGAATAATTTTTTTTATGCTGCATTTAATAAAGAATTTGATTATTATCCATTACCAAATTACGTGGCTTTAAGTGGGCATGGAATACATTTATATTATGTTTTTGAAGAACCAGTACCATTGTTCCCTAATTTGAAGCTTCAATTAAAGGAATTTAAATACTCTTTAACTGAAAAAATGTGGAACAAAAATACTTCTGTTGATGAGAAAGTACAAAAACAAGGAATCAATCAGCCTTTTAGAATATTGGGTGGAAAATGTAAAAAGAATGCTCCACTGGATAGAGTGGAAGTGTATAGAGTAAATCAGCATCCAGTCAACATAGAGTATTTGAATCGTTTTGTTCCCACTAAAATTGAGATTGATGAAAAAAAATTATTCAAGGAAAGTAAATTAACACTGGATCAGGCAAAGGAAAAGTATCCGGAATGGTACGAAAATAAGGTTGTAAAGGGTATAAGAAGCTATTGGACAGTAAAACGTGATCTATACGACTGGTGGATCCAACAAATAAAAAAAGAAGAAAATGGAGCCAGTTATGGCCACAGATATTTTTGTATTATGACATTGGTGATTTATGGCATAAAATGTGGTTTATCTAAAGATGAGATAGAACAGGATGCAATTGATTTGATACCGTTTCTAAACGGTTTAAATGAAAAAGAACCATTTACAGAGGAAGATATTAAATCAGCTTTAGAGTGTTATGATGAACGATACAATACTTTTCCTTTAAAAGATATTGAGAAATTAACGAATATTCGAATCGAAAGAAATAAACGTAATGGTCGAAAACAAGATCAACATATAAAAATTATGAATGCGATTCGTGATATTGAACATCCAAATGGCTCATGGATTAATAAAGAAGGAGCTCCAACGAAGCAATCAATAGTTCAAAAATGGAAATTAGAAAATCCTGAAGGAACAAAATATCAATGCGTTAAAGATACAGGTTTATCAAAAAACACAGTGAAAAAATGGTGGAACAATTAAC
>SRR9161490_NODE_224_length_4468_cov_41.505099||full shape:circular||start:97||end:2840||group:NCLDV||score:0.94||hallmark:0
TGAAATACACAATAACAGGCTTTTCAGAAAAATTCGATATTCCTAAGGATGAAGTAATTAAAAACCTAAATACGACTTATAAATCGTATGTAACGAAAGAAAGAGGAATAACTTATATTGATGAACAGGCGGCGCGGCAGAAACAGGAAGAAACAAAAGTAGAAGAAACTGTAAGCACTATAAGCGAAGAACAGAAAGAACTAAACAATAATAATGCGCTCATAGATGGATATAAGGCGCAAATAAGTGAGCTAAAGCAGGAATTAGCGAAGGAAAGAGAGAAAAACAGCGAAACAGAAGCAAAGCTATTAGAAATGATGGATAAGGTTATAAAACTAACAGAGAATACACAGATTCTAATGGCGCAGATTCAAAGCCAGCACCAGCTTTTAATAGAGAACAACAAAAAGAAGCGAACTATAAGAGAAGTGTTTAGCGACTTTATAAAAAAAGAGAAGCCGTAATTTACGGCTTCTTTTTCTTACCAGTCTTTAAGTTTATCAAAGATGCTGGGCTTTTTCGGTGCTTTTTCAACGCTCAAATCAATCTCATCCTTGTATTCTTCAAGGAAATGTTTTGTAGCTGTCATAACGCTTTTACGCATTTTAACCGTTCCATCTTCATTTGTTTCATAAACTATATTTCCGTTTTTATCTGTTACTGGCTGTTTATTTTCATCTTTCTTTTCAATAGGTTTAATTGCGGCATCCTTAAACTTCTTCTTATCTTCTTTTGTGTTATGATAGGTTTCAATATAATCTACCATATTCGCAATAGTAAGAGCTTTTCGCGCTTGTGCTTTATAGATATTCAAATTCTCAGAAGTGGAATTATTATTGATATACTCCATCCACTTATCTACGGGGTTATCATCGAAGAACTTTTTAATTTTTTCTTTTTGCTCTTTTGTCATTTTGCTTTTTTCATTGCTCATAGCTTTTAGCCCCCTATGAATTTATTACTAAGATTATAGCAGTATAAGGGGAGATAGTCAAGCTGTTTGTAAAGTGTTTATATTTCCTTGAATTAAGGACAGCTTCAAATCTGCGTTTATAAATCTCAGAAAATCACTTTACGAATATGTTCGTAAATTAAATAATATTTCTTGACACTTTTTTATTTGCAAGCTATAATAGTAACCGTAAGAGAGAGTAACGAAAACACAACGATTTGAAAGGAACTATAAGTAATGTCTAAGATAGCATATTTGAGGGTAAGCACCACACATCAGAACACAGCGCGGCAGGAATACGCAATGCCAGCTGATATTGATAAGGTGTTTGAAGATAAGGCGAGCGGGAAGGACACAGAGCGCCCAGAGTTTAAGAAGATGCTCGATTATGTGCGCGAAGGTGATATAGTCTACTTTGAGAGCTTTTCCCGCATAAGCCGCAGTTTGCCCGATTTACTCAATACTCTTGATTATTTCACGCAAAAGGGCGTTTCCTTTGTGTCGCTGAAAGAGAACATCGACACGACGGGAGCAACGGGAAAGCTTATTGTGTCGGTGCTGGGTGCTATAAGTGCCTATGAGCGGGAAATAAACGCAGAACGGCGGGAATATGGCTACCGCAAAGCCCTTAACGAAGGGAAGGTAGGACGACCCAAAGCCGAAGTAAGCGACAAACTAAGAGAAGCAGTAAAACGCTGGCGTGCGGGAGAGATTACAGCGACCGAAGCAATGAGAATCAGCGGCACAACGCGAACAACGTTCTACAAGCTGGTGAAGAAAGAGGGGCTATAACCCCTCTTTTTTTGATTCAGCAGAGTTTAGAGAGACAATAAACCCGCCTTTTTTCTCCCTCAAATCTCGGCTGGAGGATTCCTCCAGCCACGCTAAGTAGCCACTGGGGGGCTATATTCGCGGAAAATCAAAAAAAGCAAGGCTATAATGGGGTTTGCGTGGTTCAACATCCCGCTAAATTATGCTTTTTTTAGTTAATAAATTTTAGGACAGAGCAGGATAAAGCATATAGAAGAAAATTGAAAGAAAAATGGTCGGGAATAATATACAGTAGTAATATCTAAGGGGAGTAATGAGAAATGAAAAGAACAGATAACTATACAGTAGTATCATTCAGAGTAGAAGAAGAATTAGCAGAGCAATTAAAGGCAGAAGCAAAGCGCAGGTATATGTCAGCGTCAGCTTATATAAGAAAACTATTAGTTTATGATTTGAAAGGGGAAAATAAGTAATGATAAGAAATAGCTTAGAAAATTTAATAAGTGAAGAAACAAAAAAAAGCCCAACATATTAGCTATACAAATAGCATAGAAATATGGTTGATAGCTTTTATCAAGAATATAAGCACCAAGCAGAACTAAAGCAAATGGAAGAAAATATTTATAATAGACTTATAAAAGATATAAATATTGAAATCGTAGATAAAGCAACGCCAGCTATTAAAGAATTAGATAAGCAAATAAGGGATATTTTCAAAAAATGAACGCTGGAGAAGTAGAAGCACAGAACTATTTGCTTAAAAATGGTTGGAAAGTAAAGAATCTAACGGCGTGTAAGGATTTTTTTAGTAAAGATATAGATTTCCTAATAGAGAGAGATTAGGAAAGATTTTATATAGAAGTCAAATGGGACACTAAAATTAAACATACTGGTAATATGTTTATAGAAGTTAGCGCAGATATAGAAAACAACAAAGACGGCTGGTATAATTATTGTGAAGCAGACTTTATTTTCTATGGAGATGCTTTGAATAAATTGTTTTATGTATTTAGATTATAGG
>SRR9161490_NODE_255_length_3849_cov_51.199262||full shape:circular||start:3223||end:3600||group:dsDNAphage||score:0.887||hallmark:0
TAGTCCTCGTCTTCTATATCTAACATTTCCTCTATTTGAGTTATACGTTTTTCTAAGTCTAAAAGTTGTATTGCTTTGATATTTGCATTACTACAACTTAGTATTGCTTTTCCTTGTGAAGGTGTTACTGTTCCCTCTCTTATTTCCTTAGCAAGTTTCTCATTAGTTGCTAGTATCTGTTTTAAGTTTCTTATTGCTATGTAATCTGAATCTTTTACGAACTCCATTTCTGCCACCTCCTAACCTGTTTCATGTATTAGCTTTCTTATTAAAGCACTTTTGGTCATGTTATATTTATTTGCCAACTGTTCTAGTTTCTCTATATCTTTTTTGGATACTCTCACTTCTAGTCTTTTATCTTTTATATCTTTTTTCATA
>SRR9161490_NODE_293_length_3437_cov_5738.632170||full shape:circular||start:747||end:3239||group:lavidaviridae||score:0.98||hallmark:0
TGGCACAAGCATCAATACACTTCGAGCCCGTCAAGGGCGGCAGCGAGGAACACAACAGACGTTTGAAGTTCCTCGATTACGTGCGACCTGATCGCACGCACCTCAACGACTATTGGGAGAGCGGAACGCAAAGCGATCGCCTTGCAAACATCACCCAAAATTTCCTCGAACATCACCCAACTCGCAAGAAGCTTCACGCAAAAGCAACCCCCATCAGGGAGGCAGTCGTGAACATCACCGAAGAAACCACGATGACCGACCTCTTGCGTTTGGGGTCACGGCTTAATGAACGTTTTGGCATAAGCATCTTCCAAATTGCCATTCACAAGGATGAGGGGTATTTCGGTTCAGATACCGACAAACTGAACCTTCATGCCCACCTCGTAGCAGATTGGACGAACCCAAGCAATGGCGAATCTATCAAACTCAATCGGCAAGACATGGCAGAGATGCAGACCATCACCGCAGAGGTTCTTGGGATGCAGCGAGGTGTTTCTTCTGATAAGAAGCATCTCACAGCTATGCAGTATAAAGAGCAGAAAGCACGTGAGGAAGCGGAGAAAGCAAAGCAAGAACAACTCAAAGCGGAATCCGCCCAGAGAGTTGCCGAGCGCAAAGCTGCTGAAGCCATGGAGAAGAAGAAAACGGCAGAAGCGGCAGCGGTGAGCGGCTTAGTTGTCGAAAGTACTAAGAAACTCGGCAATCTGCTCGGCTTTGGCAAGGAAGCAAAGGCACTGAAGGAACTACCTGCACAACTGGATGCCGCAAAAGCTGAAGGACGAGCGGAAGCGGTCGAAGAGGTTCTGAAGGGAGCAGGCATGAAGTACAACGATATGTCGAAAGTAACCCCCGAGAAGGTCGGAAAAGACTTGATGAACATAGTTCACAAGAATGCAGAAGCCGCACAAGAGGACACGAAGAAACTTAGAATCATACAGAATATGACAGAAGGAAATTACACTTATGATGCAGCTGCAAAGCTCATCAAAGAGAAATATGCCGATATGGCATACTACAAAGAAGCTTTTGGGTACGCAGGTAGTGCAGATGCATTCGACTTCAAAAAAGAAGTGTTCAACCCGCTCTGTGAGCGTCAGGGAGCGCACAACACTTCTAGTGATGAACACGCAATCGGTCGCAGAGAAATATGCGCACAGGGCATTGTATGCGCCTGCATCCGCTTCTTTGATTCTTTCAAACTCGATAAGATAGCAAAGACCCTTAAAGCGATGGCACGAGATTTCAGTCTCGCCGATTGGCGAAAGCAACAAGAATACTCTCGCCAACTCCAGGAGCAGAACCAGGAACGGAAGAACCAGGAGCAGAAGCAGGGAAGAGGATGGACTTTCAGAAGATAGCAACAAAAAGAGCACTCGAGGTTATCCCTGAGTGCTCTTTTTGTTGAATAGTTCTAACTCTTTTCCCTTCATCTTCTCGTGGTTGATGATTTCCATGAGGTGGTCTGGTGCTACCATTTCCGTTAGAATCTCTCCTGAATCTACATCATAATAGCCAAACTTATCTTCTATCTCTTTCCATGAGAATGTTTTCCAAGACATATCTTGCAGGAAAATAGCTGCTTCTTCTTCCGTCTCTTCTTCGTTGACTTCTCCGCACTTCCAACTTTCGTCCGAGCCGATGCGTTTATATACCTGCCTTCCGCTCATCGCTTCAAATATCACGTTCATGTGCTTTGCATCAAAGAAGCGTTTTCCGTTGCTGTCCTTTGCAACCAACTTCGTGAAGTACTTGAATATCTCAAGATATGCTTTTTTGTTCGTCACTTCCTGATAGTTTTGGGCTGCAGGTGACGACTCTGGGTTTATCTTGAGCCACTTTTCTATAATCTTCAAGGCTGCTTCCTTATTGTTCACTAGTATGTGAAAGTGCGGATGGTACGTGTCGTACCACTCATCTTTCTCAGATGTACGTTCGAATCTCTGTACCTTGTACCACTTTCCGTGTTCTTGCTTCCACTCAGGTGGCAGCTTCTTCAAAACATATCTGTTTGCGTGATAGGTACACTCCAGTTTCTTGATGCCTATCATATTCTCGAGCACTGTTGCACGGCGAAACCACTTTGAACTTTTGATTAATTGCCACTTCTTGTTATAAGTTGCAATCTCTTCAGGAAGCTGCTCTGCTCGAACGTTCGGACGTGTCAGGGTGATGAAATATAACTCTTTCTCATCCTGCAATCGAGGTGCATACGCATTAATAAGCGTACCCATCCTGATGCGCTGACACTGCGGACACCACCTATTTTTGCAGTACTTTGCAGTAATTCTTCCGTTGCCTTGATACAACTTTTCACAGCAGTGGAAGGAGTTCTGATATCTAGTTCTAAGACTAGAATCAGGGTTCTGATAGTACAACATACTAGCGAGGTGATAGCCAAACCACCTGTATTTGTTCTTTTTTCGCAATGCGATAGTGACTTTTAACGCAGAATCTTTTGCGTTCTCGGAATTTTTATCTAACTTTGCACCCATG
final-viral-boundary.txt final-viral-combined.fa.txt final-viral-score.txt
Thanks!
@hjdong, can you send the original sequence of >SRR9161490_NODE_255_length_3849_cov_51.199262
in the input sequence for VirSorter2?
>SRR9161490_NODE_255_length_3849_cov_51.199262||full
in the output is very short (< 400bp), there should be a bug dealing with such short sequences. For practical purpose, you can also just remove such short sequences (< 1kb or larger). These short sequences are generally not reliable, unless they have a hallmark gene.
original sequence
Here is the original sequence for VirSorter2(version 2.0.beta).Contigs less than 1 kb in length were discarded. My command:
virsorter run -w SRR9161490 -i SRR9161490.contigs.1k.fa.txt -j 12 -d vir2
@hjdong, I have fixed the issue. Now the sequence names in final-viral-combined.fa and final-viral-score.tsv should be the same. I also add a few extra cols. so you can filter the score table with score, sequence length, hallmark gene count, viral gene % and cellular gene %.
Thanks for providing the data for me to reproduce the issue.
Hi @jiarong, I downloaded the conda version, and that discrepancy doesn't seem to be solved.
Regards, Maria
Right, filtering is recommended. The current default cutoff (0.5) is set based on simulated virome data from known genomes. With real metaG data that have more unknown sequences, the default cutoff tends to have high sensitivity with a trade off of more false positives. Shorter sequences also tends to have higher false positive rate. Higher cellular sequence proportion can also increase the false positive rate. So there is no cutoff that can fit all cases and that's why we leave it to the users.
From my experience so far, I recommend >0.75. There are a few options for filtering:
- Filter score based on the
final-viral-score.tsv
table.--min-score
option--hallmark-required-on-short
option requires sequence shorter than 5K to have hallmark gene- Only pick sequences with hallmark genes using hallmark filed in fasta header in
final-viral-combined.fa
for highest confidence
Hi, @jiarong , you mentioned the "--hallmark-required-on-short" option requires sequences shorter than 5K to have hallmark gene, but the --help document says "--hallmark-required-on-short require hallmark gene on short seqs (length cutoff as "short" were set by "MIN_SIZE_ALLOWED_WO_HALLMARK_GENE" in template-config.yaml file, default 3kbp); this can reduce false positives at reasonable cost of sensitivity [default: False]". So how can I change the --hallmark-required-on-short option? like "--hallmark-required-on-short 5000", or not. Meanwhile, I didn't find the template-config.yaml file. So, how should I write in the command line?
Thank you so much! Looking forward to your reply!
@Jiulong-Zhao the default setting of MIN_SIZE_ALLOWED_WO_HALLMARK_GENE
has changed. You can modify it for each viral group in template-config.yaml
. If you installed the development version, it should be in VirSorter2/virsorter
directory. If you installed bioconda, it is tricky. You can update to the newest and run virsorter config --show-source
to track it.
@Jiulong-Zhao the default setting of
MIN_SIZE_ALLOWED_WO_HALLMARK_GENE
has changed. You can modify it for each viral group intemplate-config.yaml
. If you installed the development version, it should be inVirSorter2/virsorter
directory. If you installed bioconda, it is tricky. You can update to the newest and runvirsorter config --show-source
to track it.
@jiarong Thank you for your answer! I have found the template-config.yaml file in that directory! Thanks again!
Dear jiarong: First of all, thank you very much for developing such a wonderful software. It works really well. When I compared virsorter2 with virsorter and virfinder respectively under the default parameters, I found that virsorter2 could generate more viral contigs, even tens of times, which made me excited and worried. We know that both virsorter and Virfinder results need to go through some filtering rules to get better results, such as "category1 and 2 " for virsorter and "score >0.7 and p < 0.05" for virfinder. So do I need to filter the results of Virsorter2 too? What are the rules and criteria for filtering?
These are the results of three software tests: Total contigs (input): 52497 viral contigs identified by virsorter2: 35750 viral contigs identified by virsorter: 1942 viral contigs identified by virfinder: 1943 It looks like virsorter's results are more similar to virfinder's
Thanks again. Looking forward your reply