Closed ntromas closed 4 months ago
Hi, sorry for the late reply. I missed this issue somehow... I can not tell the exact issue from the post info. It's generally not recommended to have punctuation other than underscore or dot in the original fasta header. If you search for "S1Ck141NC54963" in the input fasta, you might be able to find out what in the fasta header is causing the issue.
Hi,
Thanks for the answer. The fasta header is only composed of S1Ck141NC54963 This is why I don't get the issue with header name.
Thks for the help!
Cheers
Nico
Le mer. 5 juin 2024 20 h 49, jiarong @.***> a écrit :
Hi, sorry for the late reply. I missed this issue somehow... I can not tell the exact issue from the post info. It's generally not recommended to have punctuation other than underscore or dot in the original fasta header. If you search for "S1Ck141NC54963" in the input fasta, you might be able to find out what in the fasta header is causing the issue.
— Reply to this email directly, view it on GitHub https://github.com/jiarong/VirSorter2/issues/199#issuecomment-2150733508, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY5D6A2NUCCMKPWPVP7Y7DZF5MSDAVCNFSM6AAAAABH4EMEMOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJQG4ZTGNJQHA . You are receiving this because you authored the thread.Message ID: @.***>
There might be hidden characters around it, which might happen file became corrupt in big data processing. If this is the case, I would suggest split the big file into smaller pieces and run them separately.
Hi,
I did it and got the same issue with smaller files.
Input splitted: nicot@SuperPhelix5000:~/virus/new_analysis/splitted_virus_fasta$ ls out_0.fasta out_1.fasta out_2.fasta out_3.fasta out_4.fasta out_5.fasta out_6.fasta out_7.fasta out_8.fasta out_9.fasta
Verification header that cause issue: nicot@SuperPhelix5000:~/virus/new_analysis/splitted_virus_fasta$ grep -A 1 "S1Ck141NC54963" out_0.fasta
S1Ck141NC54963 GACCTATTGATTTTGTGACAAGGCGCAAAGCATCAAATTCGTTCATGGGCTTGCGTTCTAACTCTGCCAG
nicot@SuperPhelix5000:~/virus/new_analysis/splitted_virus_fasta$ grep -e "S1Ck141NC54963" out_0.fasta
S1Ck141NC54963
Not sure to see any special char...Or maybe it is a space... I can send you an example of the input...
Cheers,
Nico
So the other files ran successfully, right? For specially characters, you need to open the file in text editor to see. If the file is too big you can do
grep -A 1 -B 2 "S1Ck141NC54963" out_0.fasta > tmp.fasta
Then open tmp.fasta
in text editor.
I did not find any hidden special characters in the attached files, but there must be some issue in your input file. How big is out_0.fasta
? Can you send me one of those smallest one that failed the run?
Just had time to took a look. It turns out the duplicate group in your command --include-groups dsDNAphage,dsDNAphage,NCLDV,RNA,ssDNA,lavidaviridae
is causing the issue. If you remove the duplicated dsDNAphage
, it should run successfully.
Huh... I focus on the error message but did not look enough the command... Feeling a bit stupid now :) Thanks!
Hi VS2 team,
I am running VS2 on a large file (contigs from metagenomes) and I got this issue:
This is the error:
2/lib/python3.10/site-packages/virsorter/./scripts/provirus.py iter-0/dsDNAphage/all.pdg.gff.splitdir/all.pdg.gff.0.split iter-0/dsDNAphage/all.pdg.hmm.tax /mfs/nicot/virus/new_analysis/VIRSORTER_DRAM/vir_db/rbs/rbs-catetory.tsv /mfs/nicot/virus/new_analysis/VIRSORTER_DRAM/vir_db/group/dsDNAphage/model iter-0/dsDNAphage/all.pdg.gff.splitdir/all.pdg.gff.0.split.prv.bdy iter-0/dsDNAphage/all.pdg.gff.splitdir/all.pdg.gff.0.split.prv.ftr --fullseq-clf iter-0/all-fullseq-proba.tsv --group dsDNAphage --proba 0.5 2> $Log || { echo "See error details in $Log" | python /mfs/nicot/miniconda3/envs/vs2/lib/python3.10/site-packages/virsorter/./scripts/echo.py --level error; exit 1; } fi rm -f $Log
[2024-05-17 02:47 ERROR] See error details in /mfs/nicot/virus/new_analysis/VIRSORTER_DRAM/VS2/log/iter-0/step3-classify/pick-viral-fullseq.log [Fri May 17 02:47:04 2024] Error in rule pick_viral_fullseq: jobid: 38 output: iter-0/viral-fullseq.fa, iter-0/all-hallmark-cnt.tsv, iter-0/viral-lt2gene-w-hallmark.fa conda-env: /mfs/nicot/virus/new_analysis/VIRSORTER_DRAM/vir_db/conda_envs/5631f754 shell:
This is the log information:
cat /mfs/nicot/virus/new_analysis/VIRSORTER_DRAM/VS2/log/iter-0/step3-classify/pick-viral-fullseq.log Traceback (most recent call last): File "/mfs/nicot/miniconda3/envs/vs2/lib/python3.10/site-packages/virsorter/./scripts/add-extra-to-fullseq-fasta-header.py", line 114, in
main()
File "/mfs/nicot/miniconda3/envs/vs2/lib/python3.10/site-packages/virsorter/./scripts/add-extra-to-fullseq-fasta-header.py", line 92, in main
start_ind, end_ind, viral, cellular, hallmark = d_name2info[name]
KeyError: 'S1Ck141NC54963'
Is there a specific format for the header?
Command and version: VirSorter 2.2.4 /mfs/nicot/miniconda3/envs/vs2/bin/virsorter run -i ../virus_postCheckV_5000.nored_nodup.fa --include-groups dsDNAphage,dsDNAphage,NCLDV,RNA,ssDNA,lavidaviridae -j 50 --prep-for-dramv -d vir_db/ -w VS2
Cheers,
Nico