jiarong / VirSorter2

customizable pipeline to identify viral sequences from (meta)genomic data
GNU General Public License v2.0
225 stars 31 forks source link

Issue with --prep-for-dramv part of SOP #126

Closed yugen-miyahara closed 2 years ago

yugen-miyahara commented 2 years ago

Hi there,

Running the development version I was able to fix the hmm error and could run the first round of the virsorter SOP fine for my data. But after running checkV and running through virsorter again I get an error. I looked through the log files and couldn't find anything and I couldn't find any other issue with the same problem.

The code I ran: "virsorter run --seqname-suffix-off --viral-gene-enrich-off --provirus-off --prep-for-dramv -i ~/Desktop/checkV_novaseq_output/1/viruses.fna -w ~/Desktop/vs2_output_novaseq/p2 --include-groups dsDNAphage,ssDNA --min-length 10000 --min-score 0.5 -j 28 all"

Here is the error:

[2022-08-25 00:50 INFO] Step 1 - preprocess finished. [2022-08-25 01:03 INFO] Step 2 - extract-feature finished. sed: 1: "s/(||full([[:space:]] ...": \2 not defined in the RE [Thu Aug 25 01:04:01 2022] Error in rule finalize: jobid: 7 output: final-viral-combined.fa, final-viral-score.tsv conda-env: /Users/yugenuni/VirSorter2/db/conda_envs/2ec0540b shell:

        echo iter-0/*/all.pdg.gff.splitdir/all.pdg.gff.*.split | xargs rm -f
        python /Users/yugenuni/VirSorter2/virsorter/./scripts/filter-score-table.py config.yaml iter-0/viral-combined-proba-more-cols.tsv iter-0/viral-combined.fa final-viral-score.tsv final-viral-combined.fa

        if [ True = "True" ]; then
            mkdir -p for-dramv
            python /Users/yugenuni/VirSorter2/virsorter/./scripts/modify-seqname-for-dramv.py final-viral-combined.fa final-viral-score.tsv -o for-dramv/final-viral-combined-for-dramv.fa
            cp iter-0/viral-affi-contigs-for-dramv.tab for-dramv
        fi

        N_viral_fullseq=$(grep -c '^>.*||full$' final-viral-combined.fa || :)
        N_viral_lt2gene=$(grep -c '^>.*||lt2gene$' final-viral-combined.fa || :)
        if [ True = True ]; then
            Dramv_notes="for-dramv                  ==> dir with input files for dramv"
            Dramv_notes2="For seqnames in files for dramv, 
                | is replaced with _ to be compatible with DRAMv"
        else
            Dramv_notes=""
            Dramv_notes2=""
        fi
        if [ True = True ]; then
            sed -i -E 's/(\|\|full([[:space:]]+)|\|\|[0-9]+_partial([[:space:]]+)|\|\|lt2gene([[:space:]]+))/\2\3\4/;' final-viral-score.tsv
            sed -i -E 's/(\|\|full$|\|\|[0-9]+_partial$|\|\|lt2gene$)//;' final-viral-combined.fa
            if [ True = True ]; then
                sed -i -E 's/(__full(\|[0-9]+\|(c|l)$)|__[0-9]+_partial(\|[0-9]+\|(c|l)$)|__lt2gene(\|[0-9]+\|(c|l)$))/\2\4\6/;'  for-dramv/viral-affi-contigs-for-dramv.tab
                sed -i -E 's/(__full(__[0-9]+\|)|__[0-9]+_partial(__[0-9]+\|)|__lt2gene(__[0-9]+\|))/\2\3\4/;' for-dramv/viral-affi-contigs-for-dramv.tab
                sed -i -E 's/(__full(-cat_[1-6]$)|__[0-9]+_partial(-cat_[1-6]$)|__lt2gene(-cat_[1-6]$))/\2\3\4/;' for-dramv/final-viral-combined-for-dramv.fa 
            fi
            Suffix_notes=""
        else
            Suffix_notes="
            Suffix is added to seq names in final-viral-combined.fa:
            contigs (>=2 genes) as viral:   ||full
            contigs (< 2 genes) as viral:   ||lt2gene
            $Dramv_notes2
            "
        fi

        printf "
        ====> VirSorter run (non-provirus mode) finished.
        # of contigs w/ >=2 genes as viral: $N_viral_fullseq
        # of contigs w/ < 2 genes as viral: $N_viral_lt2gene

        Useful output files:
        final-viral-score.tsv      ==> score table
        final-viral-combined.fa    ==> all viral seqs
        $Dramv_notes
        $Suffix_notes

        NOTES: 
        Users can further screen the results based on the 
            following columns in final-viral-score.tsv
            - contig length (length) 
            - hallmark gene count (hallmark)
            - viral gene %% (viral) 
            - cellular gene %% (cellular)
        The "group" field in final-viral-score.tsv should NOT be used
            as reliable taxonomy info
        We recommend this SOP/tutorial for quality control 
            (make sure to use the lastest version):
            https://dx.doi.org/10.17504/protocols.io.bwm5pc86

        <====
        " | python /Users/yugenuni/VirSorter2/virsorter/./scripts/echo.py

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Exiting because a job execution failed. Look above for error message

*** An error occurred. Detailed errors may not be printed for certain rules. Refer to the log file of the failed command for troubleshooting

Many thanks, Yugen

jiarong commented 2 years ago

Hi Yugen, sorry for the late reply. I had quite a busy week. I have not seen this error before.

yugen-miyahara commented 2 years ago

Hi jiarong that's no problem.

I am running this on MacOS and the first time it ran fine but after running checkV output through again it gave the error.

jiarong commented 2 years ago

The reason why the first pass worked is that it does not have --prep-for-dramv. Some code in that option are not supported by MacOS..

yugen-miyahara commented 2 years ago

I ended up using docker and am now able to successfully run it.

Thank you for your help, Yugen