jiarong / VirSorter2

customizable pipeline to identify viral sequences from (meta)genomic data
GNU General Public License v2.0
221 stars 31 forks source link

error when I Run VirSorter2 again #154

Open paulamaza opened 1 year ago

paulamaza commented 1 year ago

I appreciate some help, thanks, I got this error :

[2023-03-16 20:32 INFO] # of seqs < 5000 bp and removed: 14 [2023-03-16 20:32 INFO] # of circular seqs: 2 [2023-03-16 20:32 INFO] # of linear seqs : 665 [2023-03-16 20:32 INFO] Finish spliting circular contig file with common rbs [2023-03-16 20:32 INFO] Finish spliting linear contig file with common rbs [2023-03-16 20:33 INFO] Step 1 - preprocess finished. [2023-03-16 21:03 INFO] Step 2 - extract-feature finished. sed: 1: "s/(||full([[:space:]] ...": \2 not defined in the RE [Thu Mar 16 21:04:17 2023] Error in rule finalize: jobid: 7 output: final-viral-combined.fa, final-viral-score.tsv conda-env: /Users/paulamaza/Visorter/db/conda_envs/b19043f6 shell:

        echo iter-0/*/all.pdg.gff.splitdir/all.pdg.gff.*.split | xargs rm -f
        python /Users/paulamaza/Visorter/VirSorter2/virsorter/./scripts/filter-score-table.py config.yaml iter-0/viral-combined-proba-more-cols.tsv iter-0/viral-combined.fa final-viral-score.tsv final-viral-combined.fa

        if [ True = "True" ]; then
            mkdir -p for-dramv
            python /Users/paulamaza/Visorter/VirSorter2/virsorter/./scripts/modify-seqname-for-dramv.py final-viral-combined.fa final-viral-score.tsv -o for-dramv/final-viral-combined-for-dramv.fa
            cp iter-0/viral-affi-contigs-for-dramv.tab for-dramv
        fi

        N_viral_fullseq=$(grep -c '^>.*||full$' final-viral-combined.fa || :)
        N_viral_lt2gene=$(grep -c '^>.*||lt2gene$' final-viral-combined.fa || :)
        if [ True = True ]; then
            Dramv_notes="for-dramv                  ==> dir with input files for dramv"
            Dramv_notes2="For seqnames in files for dramv, 
                | is replaced with _ to be compatible with DRAMv"
        else
            Dramv_notes=""
            Dramv_notes2=""
        fi
        if [ True = True ]; then
            sed -i -E 's/(\|\|full([[:space:]]+)|\|\|[0-9]+_partial([[:space:]]+)|\|\|lt2gene([[:space:]]+))/\2\3\4/;' final-viral-score.tsv
            sed -i -E 's/(\|\|full$|\|\|[0-9]+_partial$|\|\|lt2gene$)//;' final-viral-combined.fa
            if [ True = True ]; then
                sed -i -E 's/(__full(\|[0-9]+\|(c|l)$)|__[0-9]+_partial(\|[0-9]+\|(c|l)$)|__lt2gene(\|[0-9]+\|(c|l)$))/\2\4\6/;'  for-dramv/viral-affi-contigs-for-dramv.tab
                sed -i -E 's/(__full(__[0-9]+\|)|__[0-9]+_partial(__[0-9]+\|)|__lt2gene(__[0-9]+\|))/\2\3\4/;' for-dramv/viral-affi-contigs-for-dramv.tab
                sed -i -E 's/(__full(-cat_[1-6]$)|__[0-9]+_partial(-cat_[1-6]$)|__lt2gene(-cat_[1-6]$))/\2\3\4/;' for-dramv/final-viral-combined-for-dramv.fa 
            fi
            Suffix_notes=""
        else
            Suffix_notes="
            Suffix is added to seq names in final-viral-combined.fa:
            contigs (>=2 genes) as viral:   ||full
            contigs (< 2 genes) as viral:   ||lt2gene
            $Dramv_notes2
            "
        fi

        printf "
        ====> VirSorter run (non-provirus mode) finished.
        # of contigs w/ >=2 genes as viral: $N_viral_fullseq
        # of contigs w/ < 2 genes as viral: $N_viral_lt2gene

        Useful output files:
        final-viral-score.tsv      ==> score table
        final-viral-combined.fa    ==> all viral seqs
        $Dramv_notes
        $Suffix_notes

        NOTES: 
        Users can further screen the results based on the 
            following columns in final-viral-score.tsv
            - contig length (length) 
            - hallmark gene count (hallmark)
            - viral gene %% (viral) 
            - cellular gene %% (cellular)
        The "group" field in final-viral-score.tsv should NOT be used
            as reliable taxonomy info
        We recommend this SOP/tutorial for quality control 
            (make sure to use the lastest version):
            https://dx.doi.org/10.17504/protocols.io.bwm5pc86

        <====
        " | python /Users/paulamaza/Visorter/VirSorter2/virsorter/./scripts/echo.py

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Exiting because a job execution failed. Look above for error message

*** An error occurred. Detailed errors may not be printed for certain rules. Refer to the log file of the failed command for troubleshooting Issues can be raised at: https://github.com/jiarong/VirSorter2/issues Or send an email to virsorter2 near gmail.com if you do not use GitHub

jiarong commented 1 year ago

What's operating system (OS) you are using? Only Linux is supported currently.

paulamaza commented 1 year ago

Thanks for the reply. I think that I'm using Linux OS with "conda" . Im working on a macOS Monterey (version 12.0.1) processor 2,4 GHz 8-Core Intel Core i9 I ran without problem the viral sequence identification on a test dataset and the visorter step1 (script: virsorter run --keep-original-seq -i 5seq.fa -w vs2-pass1 --include-groups dsDNAphage,ssDNA --min-length 5000 --min-score 0.5 -j 28 all) The problem comes with step 2: virsorter run --seqname-suffix-off --viral-gene-enrich-off --provirus-off --prep-for-dramv -i checkv/combined.fna -w vs2-pass2 --include-groups dsDNAphage,ssDNA --min-length 5000 --min-score 0.5 -j 28 all

On the other hand, I tried to install virsorter on Linux system using Binder platform without success:

(notebook) jovyan@jupyter-paulamaza-2dmy-2dfirst-2dbinder-2dxhzhhezk:~$ conda create -n vs2 -c conda-forge -c bioconda "python>=3.6" scikit-learn=0.22.1 imbalanced-learn pandas seaborn hmmer==3.3 prodigal screed ruamel.yaml "snakemake>=5.18,<=5.26" click mamba Collecting package metadata (current_repodata.json): done Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source. Collecting package metadata (repodata.json): | (notebook) jovyan@jupyter-paulamaza-2dmy-2dfirst-2dbinder-2dxhzhhezk:~$ conda activate vs2

EnvironmentNameNotFound: Could not find conda environment: vs2 You can list all discoverable environments with conda info --envs.

jiarong commented 1 year ago

Hi, MacOS is not Linux. The installation does not work on binder jupyter-notebook.. If you do not have access to Linux computer or server, your best bet is to use virtual machines such as virtualbox or vagrant on your Mac.