jiarong / VirSorter2

customizable pipeline to identify viral sequences from (meta)genomic data
GNU General Public License v2.0
221 stars 31 forks source link

Module 'skip' not found. #136

Open tnn111 opened 2 years ago

tnn111 commented 2 years ago

I installed using mamba. When I run, I get the following error concerning module 'skip'. I checked and it's installed and Python finds it just fine. Any idea as to what to try?

[2022-09-29 14:16 INFO] VirSorter 2.2.3 [2022-09-29 14:16 INFO] /home/torben/opt/mambaforge/envs/virsorter/bin/virsorter run -w virsorter -i medaka/consensus.fasta -j 16 [2022-09-29 14:16 INFO] Using /home/torben/opt/mambaforge/envs/virsorter/lib/python3.10/site-packages/virsorter/template-config.yaml as config template [2022-09-29 14:16 INFO] conig file written to /Data/Ashby0008-0009-0012-0014/virsorter/config.yaml

[2022-09-29 14:16 INFO] Executing: snakemake --snakefile /home/torben/opt/mambaforge/envs/virsorter/lib/python3.10/site-packages/virsorter/Snakefile --directory /Data/Ashby0008-0009-0012-0014/virsorter --jobs 16 --configfile /Data/Ashby0008-0009-0012-0014/virsorter/config.yaml --latency-wait 600 --rerun-incomplete --nolock --conda-frontend mamba --conda-prefix /home/torben/Data/virsorter/db/conda_envs --use-conda --quiet all Job counts: count jobs 1 all 1 check_point_for_reclassify 1 circular_linear_split 1 classify 2 classify_by_group 2 classify_full_and_part_by_group 1 combine_linear_circular 2 combine_linear_circular_by_group 1 extract_feature 1 extract_provirus_seqs 1 finalize 1 gff_feature 2 gff_feature_by_group 2 hmm_features_by_group 1 hmm_sort_to_best_hit_taxon 2 hmm_sort_to_best_hit_taxon_by_group 1 merge_classification 1 merge_full_and_part_classification 2 merge_hmm_gff_features_by_group 2 merge_provirus_call_by_group_by_split 1 merge_provirus_call_from_groups 5 merge_split_hmmtbl 10 merge_split_hmmtbl_by_group 10 merge_split_hmmtbl_by_group_tmp 1 pick_viral_fullseq 1 preprocess 1 split_faa 2 split_faa_by_group 2 split_gff_by_group 61 [2022-09-29 14:16 INFO] # of seqs < 0 bp and removed: 0 [2022-09-29 14:16 INFO] # of circular seqs: 191 [2022-09-29 14:16 INFO] # of linear seqs : 107786 [2022-09-29 14:16 INFO] Finish spliting circular contig file with common rbs [2022-09-29 14:16 INFO] Finish spliting linear contig file with common rbs Traceback (most recent call last): File "/home/torben/opt/mambaforge/envs/virsorter/lib/python3.10/site-packages/virsorter/./scripts/echo.py", line 6, in import click ModuleNotFoundError: No module named 'click' [Thu Sep 29 14:36:16 2022] Error in rule preprocess: jobid: 1 output: Done-preprocess shell:

    echo "Step 1 - preprocess finished." | python /home/torben/opt/mambaforge/envs/virsorter/lib/python3.10/site-packages/virsorter/./scripts/echo.py

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Exiting because a job execution failed. Look above for error message

*** An error occurred. Detailed errors may not be printed for certain rules. Refer to the log file of the failed command for troubleshooting Issues can be raised at: https://github.com/jiarong/VirSorter2/issues

jiarong commented 2 years ago

Hi Torben, I am puzzled too that the module (click) works at the beginning but suddenly stopped working in the middle of the pipeline. Can you check if this error is reproducible? If you are running the cluster, the singularity (now called apptainer) version (installation option 3) should be the most reliable.

tnn111 commented 2 years ago

Hi Jiarong,

I checked and it’s reproducible. I tried the Singularity version as well and I got a different error. No matter what input I provided, it showed as having no data in it. Yes, I used absolute paths. I’ll look some more at that.

And yeah, I know ‘click’ is there. I can start up Python and import it just fine.

I followed the installation instructions exactly too.

Thanks, Torben

On Sep 30, 2022, at 03:44, jiarong @.***> wrote:

Hi Torben, I am puzzled too that the module (click) works at the beginning but suddenly stopped working in the middle of the pipeline. Can you check if this error is reproducible? If you are running the cluster, the singularity (now called apptainer) version (installation option 3) should be the most reliable.

— Reply to this email directly, view it on GitHub https://github.com/jiarong/VirSorter2/issues/136#issuecomment-1263414621, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMXPRWFIHC7UCLIDKX5NNTWA3ABPANCNFSM6AAAAAAQZGDLNU. You are receiving this because you authored the thread.

jiarong commented 2 years ago

Did you run the singularity version as below? /absolute/path/to/virsorter2.sif -w virsorter.out -i medaka/consensus.fasta

If your cluster sys admin has changed the singularity default file system binding, try: singularity run -B $PWD,$HOME /absolute/path/to/virsorter2.sif -w virsorter.out -i medaka/consensus.fasta

tnn111 commented 2 years ago

Hi Jiarong,

I have tried everything I can think of. I am the system admin and singularity is installed just fine.

Here is what I keep getting:

[2022-10-02 23:48 INFO] VirSorter 2.2.3 [2022-10-02 23:48 INFO] /usr/local/bin/virsorter run -w /home/torben/virsorter -i /home/torben/consensus.fasta -j 16 [2022-10-02 23:48 INFO] Using /usr/local/lib/python3.9/site-packages/virsorter/template-config.yaml as config template [2022-10-02 23:48 INFO] conig file written to /home/torben/virsorter/config.yaml

[2022-10-02 23:48 INFO] Executing: snakemake --snakefile /usr/local/lib/python3.9/site-packages/virsorter/Snakefile --directory /home/torben/virsorter --jobs 16 --configfile /home/torben/virsorter/config.yaml --latency-wait 600 --rerun-incomplete --nolock --conda-frontend mamba --conda-prefix /db/conda_envs --use-conda --quiet all Job counts: count jobs 1 all 1 check_point_for_reclassify 1 circular_linear_split 1 classify 2 classify_by_group 2 classify_full_and_part_by_group 1 combine_linear_circular 2 combine_linear_circular_by_group 1 extract_feature 1 extract_provirus_seqs 1 finalize 1 gff_feature 2 gff_feature_by_group 2 hmm_features_by_group 1 hmm_sort_to_best_hit_taxon 2 hmm_sort_to_best_hit_taxon_by_group 1 merge_classification 1 merge_full_and_part_classification 2 merge_hmm_gff_features_by_group 2 merge_provirus_call_by_group_by_split 1 merge_provirus_call_from_groups 5 merge_split_hmmtbl 10 merge_split_hmmtbl_by_group 10 merge_split_hmmtbl_by_group_tmp 1 pick_viral_fullseq 1 preprocess 1 split_faa 2 split_faa_by_group 2 split_gff_by_group 61 [Sun Oct 2 23:48:46 2022] Error in rule circular_linear_split: jobid: 8 output: iter-0/pp-seqname-length.tsv conda-env: /db/conda_envs/a2041c3e shell:

    # prep_logdir
    mkdir -p log/iter-0/step1-pp log/iter-0/step2-extract-feature log/iter-0/step3-classify

    Cnt=$(grep -c '^>' /home/torben/consensus.fasta)
    if [ ${Cnt} = 0 ]; then
        echo "No sequnences found in contig file; exiting"               | python /usr/local/lib/python3.9/site-packages/virsorter/./scripts/echo.py --level error
        exit 1
    fi

    python /usr/local/lib/python3.9/site-packages/virsorter/./scripts/circular-linear-split.py           /home/torben/consensus.fasta           iter-0/pp-circular.fna.preext          iter-0/pp-linear.fna           iter-0/pp-seqname-length.tsv           "||rbs:common"           0

    if [ ! -s iter-0/pp-circular.fna.preext ]; then
        echo "No circular seqs found in contig file"               | python /usr/local/lib/python3.9/site-packages/virsorter/./scripts/echo.py
        rm iter-0/pp-circular.fna.preext
    else
        python /usr/local/lib/python3.9/site-packages/virsorter/./scripts/circular-extend.py               iter-0/pp-circular.fna.preext iter-0/pp-circular.fna
    fi

    if [ ! -s iter-0/pp-linear.fna ]; then
        echo "No linear seqs found in contig file"               | python /usr/local/lib/python3.9/site-packages/virsorter/./scripts/echo.py
        rm iter-0/pp-linear.fna
    fi

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Exiting because a job execution failed. Look above for error message

*** An error occurred. Detailed errors may not be printed for certain rules. Refer to the log file of the failed command for troubleshooting Issues can be raised at: https://github.com/jiarong/VirSorter2/issues

On Oct 1, 2022, at 06:14, jiarong @.***> wrote:

Did you run the singularity version as below? /absolute/path/to/virsorter2.sif -w virsorter.out -i medaka/consensus.fasta

If your cluster sys admin has changed the singularity default file system binding, try: singularity run -B $PWD,$HOME /absolute/path/to/virsorter2.sif -w virsorter.out -i medaka/consensus.fasta

— Reply to this email directly, view it on GitHub https://github.com/jiarong/VirSorter2/issues/136#issuecomment-1264362982, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMXPRXMZP66NV2W6ZHGDKLWBA2L7ANCNFSM6AAAAAAQZGDLNU. You are receiving this because you authored the thread.

jiarong commented 2 years ago

Hi Torben, sorry for the delay. What's your linux version? singularity version? and did you the test example in README shows the same error?

tnn111 commented 2 years ago

Hi Jiarong,

I installed a fresh machine this past week and I put Apptainer on it. Then I ran using your test data and all was well; I thought I was home free.

But then I tried to run with my own data and although it ran for a bit, it also errored out and I can’t figure out why. Any help appreciated. The output is attached.

Thanks, Torben

[2022-10-22 22:01 INFO] # of seqs < 1500 bp and removed: 8987 [2022-10-22 22:01 INFO] # of circular seqs: 156 [2022-10-22 22:01 INFO] # of linear seqs : 98834 [2022-10-22 22:01 INFO] Finish spliting circular contig file with common rbs [2022-10-22 22:01 INFO] Finish spliting linear contig file with common rbs [Sat Oct 22 22:07:50 2022] Error in rule gene_call: jobid: 146 output: iter-0/pp-linear.fna.splitdir/pp-linear.fna.53.split.pdg.splitgff, iter-0/pp-linear.fna.splitdir/pp-linear.fna.53.split.pdg.splitfaa conda-env: /db/conda_envs/a2041c3e shell:

    Log='iter-0/pp-linear.fna.splitdir/pp-linear.fna.53.split.pdg.log'
    prodigal -p meta -i iter-0/pp-linear.fna.splitdir/pp-linear.fna.53.split -a iter-0/pp-linear.fna.splitdir/pp-linear.fna.53.split.pdg.splitfaa -o iter-0/pp-linear.fna.splitdir/pp-linear.fna.53.split.pdg.splitgff -f gff  &> $Log || { echo "See error details in $Log" | python /usr/local/lib/python3.9/site-packages/virsorter/./scripts/echo.py --level error; exit 1; }
    rm -f $Log

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

[Sat Oct 22 22:07:50 2022] Error in rule gene_call: jobid: 131 output: iter-0/pp-linear.fna.splitdir/pp-linear.fna.38.split.pdg.splitgff, iter-0/pp-linear.fna.splitdir/pp-linear.fna.38.split.pdg.splitfaa conda-env: /db/conda_envs/a2041c3e shell:

    Log='iter-0/pp-linear.fna.splitdir/pp-linear.fna.38.split.pdg.log'
    prodigal -p meta -i iter-0/pp-linear.fna.splitdir/pp-linear.fna.38.split -a iter-0/pp-linear.fna.splitdir/pp-linear.fna.38.split.pdg.splitfaa -o iter-0/pp-linear.fna.splitdir/pp-linear.fna.38.split.pdg.splitgff -f gff  &> $Log || { echo "See error details in $Log" | python /usr/local/lib/python3.9/site-packages/virsorter/./scripts/echo.py --level error; exit 1; }
    rm -f $Log

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

[Sat Oct 22 22:07:50 2022] Error in rule gene_call: jobid: 98 output: iter-0/pp-linear.fna.splitdir/pp-linear.fna.5.split.pdg.splitgff, iter-0/pp-linear.fna.splitdir/pp-linear.fna.5.split.pdg.splitfaa conda-env: /db/conda_envs/a2041c3e shell:

    Log='iter-0/pp-linear.fna.splitdir/pp-linear.fna.5.split.pdg.log'
    prodigal -p meta -i iter-0/pp-linear.fna.splitdir/pp-linear.fna.5.split -a iter-0/pp-linear.fna.splitdir/pp-linear.fna.5.split.pdg.splitfaa -o iter-0/pp-linear.fna.splitdir/pp-linear.fna.5.split.pdg.splitgff -f gff  &> $Log || { echo "See error details in $Log" | python /usr/local/lib/python3.9/site-packages/virsorter/./scripts/echo.py --level error; exit 1; }
    rm -f $Log

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Exiting because a job execution failed. Look above for error message

*** An error occurred. Detailed errors may not be printed for certain rules. Refer to the log file of the failed command for troubleshooting Issues can be raised at: https://github.com/jiarong/VirSorter2/issues Command exited with non-zero status 1 16109.99user 52.14system 9:40.57elapsed 2783%CPU (0avgtext+0avgdata 111336maxresident)k 90636inputs+15892336outputs (764major+10371027minor)pagefaults 0swaps

On Oct 4, 2022, at 12:38, jiarong @.***> wrote:

Hi Torben, sorry for the delay. What's your linux version? singularity version? and did you the test example in README shows the same error?

— Reply to this email directly, view it on GitHub https://github.com/jiarong/VirSorter2/issues/136#issuecomment-1267491755, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMXPRVBS6JX3Z4ZPCJQQ53WBSBS3ANCNFSM6AAAAAAQZGDLNU. You are receiving this because you authored the thread.

jiarong commented 2 years ago

Looks like prodigal failed for some sequences. Are there anything in iter-0/pp-linear.fna.splitdir/pp-linear.fna.53.split.pdg.log? Disk space or file number exceeding the limit might cause error too since VS2 generates a lot of intermediate files.

tnn111 commented 2 years ago

Hi Jiarong,

Nothing there. It looks like the log file is being removed somehow.

It’s running as the only task on a system with 1 TB of main memory, >100 TB is available disk and 64 cores….

What’s also interesting is that if I try to run it as a single command using “/home/torben/bin/virsorter2.sif run ……”, it fails. Even on the test data. But if I used “apptainer shell ….” I can run the command just fine and it doesn’t have errors. Not sure what the difference is.

The system is a brand new CentOS system I just got a few weeks ago and nothing else is on it.

Thanks, Torben

On Oct 23, 2022, at 08:45, jiarong @.***> wrote:

Looks like prodigal failed for some sequences. Are there anything in iter-0/pp-linear.fna.splitdir/pp-linear.fna.53.split.pdg.log? Disk space or file number exceeding the limit might cause error too since VS2 generates a lot of intermediate files.

— Reply to this email directly, view it on GitHub https://github.com/jiarong/VirSorter2/issues/136#issuecomment-1288141079, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMXPRVRQL7HCVWSN32JYCTWEVMRVANCNFSM6AAAAAAQZGDLNU. You are receiving this because you authored the thread.

jiarong commented 2 years ago

Did you look for iter-0/pp-linear.fna.splitdir/pp-linear.fna.53.split.pdg.log within the output directory specified by (-w)? If it has been deleted, that means the rule has been finished successfully..(it's the last command of the rule rm -f $Log). Another way to check if prodigal has finished is to check if the last contig in iter-0/pp-linear.fna.splitdir/pp-linear.fna.53.split is has gff output in iter-0/pp-linear.fna.splitdir/pp-linear.fna.53.splitgff.

tnn111 commented 2 years ago

Hi Jiarong,

I think I found the problem and it’s not on your end at all. It’s caused by a disk write error on the system.

I managed to run Virsorter2 on a substantial sample. I ended up with ~50,000 high quality (using 0.9 as a cutoff) phage. Most of them dsDNA which makes sense since I’m using long read sequencing and I shouldn’t be able to pick up ssDNA at all.

A couple of questions:

i) Does Virsorter2 pick up eukaryotic viruses too? I’m assuming it does and just labels them as phage.

ii) Is either of you aware of a good pipeline to go all the way to classification?

iii) With long read sequencing these days, we often know which ones are circular. It’d be nice to have a way of passing that information to the software.

I have another sample where I expect more like 150,000 high quality phage and I’m looking for good ways of processing the data.

Thanks; it all works quite well now!

Torben

On Oct 25, 2022, at 07:03, jiarong @.***> wrote:

Did you look for iter-0/pp-linear.fna.splitdir/pp-linear.fna.53.split.pdg.log within the output directory specified by (-w)? If it has been deleted, that means the rule has been finished successfully..(it's the last command of the rule rm -f $Log). Another way to check if prodigal has finished is to check if the last contig in iter-0/pp-linear.fna.splitdir/pp-linear.fna.53.split is has gff output in iter-0/pp-linear.fna.splitdir/pp-linear.fna.53.splitgff.

— Reply to this email directly, view it on GitHub https://github.com/jiarong/VirSorter2/issues/136#issuecomment-1290612792, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMXPRTPQLQKLX7OAL2VZW3WE7SCJANCNFSM6AAAAAAQZGDLNU. You are receiving this because you authored the thread.

jiarong commented 2 years ago

Glad it's working for you. i) The dsDNAphage model is trained with dsDNA bacterial phage sequences, It might pick up some euk viruses with viral gene HMMs but not tested. ii) check out vContact2 for novel viruses (https://bitbucket.org/MAVERICLab/vcontact2/src/master/); for known viruses, kraken2 or any tool that align contig to REF genomes should work. iii) thanks for the suggestion.

jiarong commented 2 years ago

I forgot to mention that I would use a minimal length cutoff of 5kb (shorter ones are not reliable for prediction with gene based tools such VirSorter2). Also a score cutoff of 0.9 should be OK, but this SOP with screening steps is recommend way to run VirSorter2. https://www.protocols.io/view/viral-sequence-identification-sop-with-virsorter2-5qpvoyqebg4o/v3