jiarong / VirSorter2

customizable pipeline to identify viral sequences from (meta)genomic data
GNU General Public License v2.0
205 stars 28 forks source link

Error in rule circular_linear_split #142

Closed DrYoungOG closed 1 year ago

DrYoungOG commented 1 year ago

Hi, jiarong! Thanks for the excellent software!

The test dataset worked well after installation.

The server I used is Linux version 3.10.0-862.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC) )).

There was an error in rule circular_linear_split while running VirSorter2 for my own paired-end metagenome sequence:

[2022-12-07 10:01 INFO] VirSorter 2.2.3 [2022-12-07 10:01 INFO] /home/yangpc/anaconda3/envs/vs2/bin/virsorter run -w /home/yangpc/YihuoProject_metagenome_analysis/virsorter2/output/S0101w0-LED4262_L1 -i /home/yangpc/rawdata/YiHuo_Project/metagenome_raw_sequence/S0101w0-LED4262_L1_1.fastq --include-groups dsDNAphage,NCLDV,RNA,ssDNA,lavidaviridae -j 40 --keep-original-seq --prep-for-dramv --rm-tmpdir all [2022-12-07 10:01 INFO] Using /home/yangpc/anaconda3/envs/vs2/lib/python3.10/site-packages/virsorter/template-config.yaml as config template [2022-12-07 10:01 INFO] conig file written to /home/yangpc/YihuoProject_metagenome_analysis/virsorter2/output/S0101w0-LED4262_L1/config.yaml

[2022-12-07 10:01 INFO] Executing: snakemake --snakefile /home/yangpc/anaconda3/envs/vs2/lib/python3.10/site-packages/virsorter/Snakefile --directory /home/yangpc/YihuoProject_metagenome_analysis/virsorter2/output/S0101w0-LED4262_L1 --jobs 40 --configfile /home/yangpc/YihuoProject_metagenome_analysis/virsorter2/output/S0101w0-LED4262_L1/config.yaml --latency-wait 600 --rerun-incomplete --nolock --conda-frontend mamba --conda-prefix /home/yangpc/db/conda_envs --use-conda --quiet all
Job counts: count jobs 1 all 1 check_point_for_reclassify 1 circular_linear_split 1 classify 5 classify_by_group 5 classify_full_and_part_by_group 1 combine_linear_circular 5 combine_linear_circular_by_group 1 extract_feature 1 extract_provirus_seqs 1 finalize 1 gff_feature 5 gff_feature_by_group 5 hmm_features_by_group 1 hmm_sort_to_best_hit_taxon 5 hmm_sort_to_best_hit_taxon_by_group 5 merge_annotation_table_by_group_from_split 1 merge_annotation_table_from_groups 1 merge_classification 1 merge_full_and_part_classification 5 merge_hmm_gff_features_by_group 5 merge_provirus_call_by_group_by_split 1 merge_provirus_call_from_groups 6 merge_split_hmmtbl 30 merge_split_hmmtbl_by_group 30 merge_split_hmmtbl_by_group_tmp 1 pick_viral_fullseq 1 preprocess 1 split_faa 5 split_faa_by_group 5 split_gff_by_group 138 [Wed Dec 7 10:05:01 2022] Error in rule circular_linear_split: jobid: 11 output: iter-0/pp-seqname-length.tsv conda-env: /home/yangpc/db/conda_envs/f4b3daae shell:

    # prep_logdir
    mkdir -p log/iter-0/step1-pp log/iter-0/step2-extract-feature log/iter-0/step3-classify

    Cnt=$(grep -c '^>' /home/yangpc/rawdata/YiHuo_Project/metagenome_raw_sequence/S0101w0-LED4262_L1_1.fastq)
    if [ ${Cnt} = 0 ]; then
        echo "No sequnences found in contig file; exiting"               | python /home/yangpc/anaconda3/envs/vs2/lib/python3.10/site-packages/virsorter/./scripts/echo.py --level error
        exit 1
    fi 

    python /home/yangpc/anaconda3/envs/vs2/lib/python3.10/site-packages/virsorter/./scripts/circular-linear-split.py           /home/yangpc/rawdata/YiHuo_Project/metagenome_raw_sequence/S0101w0-LED4262_L1_1.fastq           iter-0/pp-circular.fna.preext          iter-0/pp-linear.fna           iter-0/pp-seqname-length.tsv           "||rbs:common"           0

    if [ ! -s iter-0/pp-circular.fna.preext ]; then
        echo "No circular seqs found in contig file"               | python /home/yangpc/anaconda3/envs/vs2/lib/python3.10/site-packages/virsorter/./scripts/echo.py
        rm iter-0/pp-circular.fna.preext
    else
        python /home/yangpc/anaconda3/envs/vs2/lib/python3.10/site-packages/virsorter/./scripts/circular-extend.py               iter-0/pp-circular.fna.preext iter-0/pp-circular.fna
    fi

    if [ ! -s iter-0/pp-linear.fna ]; then
        echo "No linear seqs found in contig file"               | python /home/yangpc/anaconda3/envs/vs2/lib/python3.10/site-packages/virsorter/./scripts/echo.py
        rm iter-0/pp-linear.fna
    fi

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Exiting because a job execution failed. Look above for error message

*** An error occurred. Detailed errors may not be printed for certain rules. Refer to the log file of the failed command for troubleshooting Issues can be raised at: https://github.com/jiarong/VirSorter2/issues

The attached file is the output of the run.

Thank you for your help!

virsorter2_output.zip

jiarong commented 1 year ago

Hi, the input for VirSorter2 should be genome or contig sequence in fasta format.

DrYoungOG commented 1 year ago

Thank you for your prompt reply!

I am new to this field, please allow me to ask a few basic questions:

1."the input for VirSorter2 should be genome or contig sequence in fasta format" means that there are two choices for input fro VirSorter2, one is the raw metagenome sequencing file directly obtained from Illumina, the other one is first assembling the aforementioned raw sequencing file to contig using assembling software such as megahit, and then input the config file to VirSorter2. Is my understanding right?

2.My raw metagenome sequencing files obtained from Illumina are in fastq format, I need to convert fastq to fasta format, right?

Thanks for your patience!

jiarong commented 1 year ago

No problem. VirSorter2 does NOT take short reads, but only genomes (a whole genome as one sequence) or a contig file as you described from metagenome.

DrYoungOG commented 1 year ago

I see. Thank you very much!