jiarong / VirSorter2

customizable pipeline to identify viral sequences from (meta)genomic data
GNU General Public License v2.0
227 stars 31 forks source link

Error in rule classify_by_group during Step 3 #123

Closed WoCer2019 closed 2 years ago

WoCer2019 commented 2 years ago

Hi there, I am running VIRSorter 2 in a Linux shell and getting an error. I used the latest bioconda version (v2.2.3).

The command I ran was the following:

virsorter run --prep-for-dramv -w test_output -i /data/LJ/Data_download/used_data/drep/output/dereplicated_genomes/SRR11673976_bin.10.fa --min-length 5000 -j 100 all

Results from stderr: `[2022-08-21 18:42 INFO] VirSorter 2.2.3 [2022-08-21 18:42 INFO] /home/test/miniconda3/envs/vs2/bin/virsorter run --prep-for-dramv -w test_output -i /data/LJ/Data_download/used_data/drep/output/dereplicated_genomes/SRR11673976_bin.10.fa --min-length 5000 -j 100 all [2022-08-21 18:42 INFO] Using /home/test/miniconda3/envs/vs2/lib/python3.8/site-packages/virsorter/template-config.yaml as config template [2022-08-21 18:42 INFO] conig file written to /data/LJ/Data_download/used_data/virsorter2/MAG_based/output2/test_output/config.yaml

[2022-08-21 18:42 INFO] Executing: snakemake --snakefile /home/test/miniconda3/envs/vs2/lib/python3.8/site-packages/virsorter/Snakefile --directory /data/LJ/Data_download/used_data/virsorter2/MAG_based/output2/test_output --jobs 100 --configfile /data/LJ/Data_download/used_data/virsorter2/MAG_based/output2/test_output/config.yaml --latency-wait 600 --rerun-incomplete --nolock --conda-frontend mamba --conda-prefix /data/LJ/software/databases/virsorter_db/db/conda_envs --use-conda --quiet all
Job counts: count jobs 1 all 1 check_point_for_reclassify 1 circular_linear_split 1 classify 2 classify_by_group 2 classify_full_and_part_by_group 1 combine_linear_circular 2 combine_linear_circular_by_group 1 extract_feature 1 extract_provirus_seqs 1 finalize 1 gff_feature 2 gff_feature_by_group 2 hmm_features_by_group 1 hmm_sort_to_best_hit_taxon 2 hmm_sort_to_best_hit_taxon_by_group 2 merge_annotation_table_by_group_from_split 1 merge_annotation_table_from_groups 1 merge_classification 1 merge_full_and_part_classification 2 merge_hmm_gff_features_by_group 2 merge_provirus_call_by_group_by_split 1 merge_provirus_call_from_groups 6 merge_split_hmmtbl 12 merge_split_hmmtbl_by_group 12 merge_split_hmmtbl_by_group_tmp 1 pick_viral_fullseq 1 preprocess 1 split_faa 2 split_faa_by_group 2 split_gff_by_group 69 [2022-08-21 18:42 INFO] # of seqs < 5000 bp and removed: 434 [2022-08-21 18:42 INFO] # of circular seqs: 0 [2022-08-21 18:42 INFO] # of linear seqs : 142 [2022-08-21 18:42 INFO] No circular seqs found in contig file [2022-08-21 18:42 INFO] Finish spliting linear contig file with common rbs [2022-08-21 18:42 INFO] Step 1 - preprocess finished. [2022-08-21 19:15 INFO] Step 2 - extract-feature finished. [2022-08-21 19:15 ERROR] See error details in /data/LJ/Data_download/used_data/virsorter2/MAG_based/output2/test_output/log/iter-0/step3-classify/all-score-ssDNA.log [Sun Aug 21 19:15:29 2022] Error in rule classify_by_group: jobid: 65 output: iter-0/ssDNA/all.pdg.clf conda-env: /data/LJ/software/databases/virsorter_db/db/conda_envs/9a36acab shell:

    Log=/data/LJ/Data_download/used_data/virsorter2/MAG_based/output2/test_output/log/iter-0/step3-classify/all-score-ssDNA.log
    python /home/test/miniconda3/envs/vs2/lib/python3.8/site-packages/virsorter/./scripts/classify.py iter-0/ssDNA/all.pdg.ftr /data/LJ/software/databases/virsorter_db/db/group/ssDNA/model ssDNA iter-0/ssDNA/all.pdg.clf 2> $Log || { echo "See error details in $Log" | python /home/test/miniconda3/envs/vs2/lib/python3.8/site-packages/virsorter/./scripts/echo.py --level error; exit 1; }

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

[2022-08-21 19:15 ERROR] See error details in /data/LJ/Data_download/used_data/virsorter2/MAG_based/output2/test_output/log/iter-0/step3-classify/all-score-dsDNAphage.log [Sun Aug 21 19:15:29 2022] Error in rule classify_by_group: jobid: 64 output: iter-0/dsDNAphage/all.pdg.clf conda-env: /data/LJ/software/databases/virsorter_db/db/conda_envs/9a36acab shell:

    Log=/data/LJ/Data_download/used_data/virsorter2/MAG_based/output2/test_output/log/iter-0/step3-classify/all-score-dsDNAphage.log
    python /home/test/miniconda3/envs/vs2/lib/python3.8/site-packages/virsorter/./scripts/classify.py iter-0/dsDNAphage/all.pdg.ftr /data/LJ/software/databases/virsorter_db/db/group/dsDNAphage/model dsDNAphage iter-0/dsDNAphage/all.pdg.clf 2> $Log || { echo "See error details in $Log" | python /home/test/miniconda3/envs/vs2/lib/python3.8/site-packages/virsorter/./scripts/echo.py --level error; exit 1; }

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Exiting because a job execution failed. Look above for error message

*** An error occurred. Detailed errors may not be printed for certain rules. Refer to the log file of the failed command for troubleshooting Issues can be raised at: https://github.com/jiarong/VirSorter2/issues`

Looking forward to your reply, thanks!

jiarong commented 2 years ago

Hi, what's the error message in /data/LJ/Data_download/used_data/virsorter2/MAG_based/output2/test_output/log/iter-0/step3-classify/all-score-dsDNAphage.log?

WoCer2019 commented 2 years ago

Hi, what's the error message in /data/LJ/Data_download/used_data/virsorter2/MAG_based/output2/test_output/log/iter-0/step3-classify/all-score-dsDNAphage.log?

Thanks for your reply.

Results from all-score-dsDNAphage.log: /home/test/.local/lib/python3.8/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator MinMaxScaler from version 0.22.1 when using version 1.1.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to: https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations warnings.warn( /home/test/.local/lib/python3.8/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 0.22.1 when using version 1.1.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to: https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations warnings.warn( /home/test/.local/lib/python3.8/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator RandomForestClassifier from version 0.22.1 when using version 1.1.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to: https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations warnings.warn( /home/test/.local/lib/python3.8/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator GridSearchCV from version 0.22.1 when using version 1.1.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to: https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations warnings.warn( /home/test/.local/lib/python3.8/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator Pipeline from version 0.22.1 when using version 1.1.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to: https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations warnings.warn( /home/test/.local/lib/python3.8/site-packages/sklearn/base.py:443: UserWarning: X has feature names, but MinMaxScaler was fitted without feature names warnings.warn( Traceback (most recent call last): File "/home/test/miniconda3/envs/vs2/lib/python3.8/site-packages/virsorter/./scripts/classify.py", line 77, in <module> main() File "/home/test/miniconda3/envs/vs2/lib/python3.8/site-packages/virsorter/./scripts/classify.py", line 68, in main pred_prob = model.predict_proba(X) File "/home/test/.local/lib/python3.8/site-packages/sklearn/pipeline.py", line 523, in predict_proba Xt = transform.transform(Xt) File "/home/test/.local/lib/python3.8/site-packages/sklearn/preprocessing/_data.py", line 509, in transform if self.clip: AttributeError: 'MinMaxScaler' object has no attribute 'clip'

Results from all-score-ssDNAphage.log: /home/test/.local/lib/python3.8/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator MinMaxScaler from version 0.22.1 when using version 1.1.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to: https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations warnings.warn( /home/test/.local/lib/python3.8/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 0.22.1 when using version 1.1.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to: https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations warnings.warn( /home/test/.local/lib/python3.8/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator RandomForestClassifier from version 0.22.1 when using version 1.1.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to: https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations warnings.warn( /home/test/.local/lib/python3.8/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator GridSearchCV from version 0.22.1 when using version 1.1.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to: https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations warnings.warn( /home/test/.local/lib/python3.8/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator Pipeline from version 0.22.1 when using version 1.1.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to: https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations warnings.warn( /home/test/.local/lib/python3.8/site-packages/sklearn/base.py:443: UserWarning: X has feature names, but MinMaxScaler was fitted without feature names warnings.warn( Traceback (most recent call last): File "/home/test/miniconda3/envs/vs2/lib/python3.8/site-packages/virsorter/./scripts/classify.py", line 77, in <module> main() File "/home/test/miniconda3/envs/vs2/lib/python3.8/site-packages/virsorter/./scripts/classify.py", line 68, in main pred_prob = model.predict_proba(X) File "/home/test/.local/lib/python3.8/site-packages/sklearn/pipeline.py", line 523, in predict_proba Xt = transform.transform(Xt) File "/home/test/.local/lib/python3.8/site-packages/sklearn/preprocessing/_data.py", line 509, in transform if self.clip: AttributeError: 'MinMaxScaler' object has no attribute 'clip'

Regarding your suggestion in #73, I have modified the $PATH in my .bashrc, but it also does not work with the same error.

test@PowerEdge-R740:~$ echo $PATH /data/LJ/software/databases/virsorter_db/db/conda_envs:/media/LJ/software/Usearch:/data/db/minimap2:/data/SHH-othername/111/ANIcalculator_v1/ANIcalculator_v1/nsimscan:/data/SHH-othername/111/ANIcalculator_v1/ANIcalculator_v1:/home/test/miniconda3/bin:/usr/bin/Trinity:/home/test/go/bin:/usr/lib/trinityrnaseq/bin:/home/test/miniconda3/envs/metabolishmm/bin:/usr/local/go/bin:/home/test/.aspera/connect/bin:/home/test/.local/bin:/media/LJ/software/Usearch:/data/db/minimap2:/data/SHH-othername/111/ANIcalculator_v1/ANIcalculator_v1/nsimscan:/data/SHH-othername/111/ANIcalculator_v1/ANIcalculator_v1:/home/test/miniconda3/bin:/usr/bin/Trinity:/home/test/go/bin:/usr/lib/trinityrnaseq/bin:/home/test/miniconda3/envs/metabolishmm/bin:/usr/local/go/bin:/home/test/.aspera/connect/bin:/home/test/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/test/go/bin:/home/test/go/bin

jiarong commented 2 years ago

Hi, I recommend installation option 3 to avoid PATH issues.