Closed mujiezhang closed 2 years ago
The error shows your input sequence is producing a protein sequence longer 100K AA, which is quite unlikely to be real and should probably be discarded. You can take a look at the contig sequence and check if there is anything strange.
Thanks for your reply! But I wonder why a sequence producing a protein sequence longer 100K AA should be discarded. Actually, the input sequence is the bacteria-GCF_001499735.1 which was download from RefSeq. And I also have the same errors in seven other bacteria which are also download from RefSeq. So, I wonder why the virsorter2 just quit ranther than pose a warning information when the script meet these sequence. And I have another question. I find the number of contigs in the final-viral-score.tsv file is slightly smaller than it in the final-viral-boundary.tsv file. So how can I explain that? 发送自 Windows 10 版邮件应用
发件人: jiarong 发送时间: 2021年7月16日 13:50 收件人: jiarong/VirSorter2 抄送: mujiezhang; Author 主题: Re: [jiarong/VirSorter2] errors when predict viruses (#81)
The error shows your input sequence is producing a protein sequence longer 100K AA, which is quite unlikely to be real and should probably be discarded. You can take a look at the contig sequence and check if there is anything strange. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
The 100K AA is set by a dependency (hmmer), which I can not control on VirSorter2 side. It's possible to improve the error handling. A gene producing 100K AA is unlikely to be real (either VirSorter2 can not predict the genes well for these specific bacteria genomes or there are some issues with genome sequences).
final-viral-combined.fa
and final-viral-score.tsv
are the final result to look at.
Really thanks! It is helpful!
Hi, I have met a strange error when I used virsorter2. It is like this:
[2021-07-16 10:29 INFO] # of seqs < 5000 bp and removed: 0 [2021-07-16 10:29 INFO] # of circular seqs: 0 [2021-07-16 10:29 INFO] # of linear seqs : 1 [2021-07-16 10:29 INFO] No circular seqs found in contig file [2021-07-16 10:29 INFO] Finish spliting linear contig file with common rbs Fatal exception (source file p7_pipeline.c, line 697): Target sequence length > 100K, over comparison pipeline limit. (Did you mean to use nhmmer/nhmmscan?) /usr/bin/bash: line 34: 236456 Aborted hmmsearch -T 30 --tblout iter-0/all.pdg.faa.splitdir/all.pdg.faa.0.split.Mixed.splithmmtbl --cpu 1 --noali -o /dev/null $Hmmdb $Tmp/$Bname [Fri Jul 16 10:30:44 2021] Error in rule hmmsearch: jobid: 87 output: iter-0/all.pdg.faa.splitdir/all.pdg.faa.0.split.Mixed.splithmmtbl conda-env: /lustre/home/acct-clsjhh/clsjhh/zmj/db/conda_envs/59c18b67 shell:
...
So what is the reason for this? And how can I solve the errors? Thanks!