jiarong / VirSorter2

customizable pipeline to identify viral sequences from (meta)genomic data
GNU General Public License v2.0
219 stars 30 forks source link

Error in rule merge_split_faa_gff: #147

Open DugauquierR opened 1 year ago

DugauquierR commented 1 year ago

Dear,

I launched Virsorter2 on several genome in a fasta file but each time I obtain this following error. Could you help me?

Kind regards

[2023-01-03 17:07 INFO] VirSorter 2.2.3 [2023-01-03 17:07 INFO] /home/bioleia/anaconda3/envs/vs2/bin/virsorter run -w /data/Data/database/HumanGut/fna/humangut_virsorter.out -i /data/Data/database/HumanGut/fna/human_gut_all_seq.fna --min-length 1500 -j 4 all [2023-01-03 17:07 INFO] Using /home/bioleia/anaconda3/envs/vs2/lib/python3.10/site-packages/virsorter/template-config.yaml as config template [2023-01-03 17:07 INFO] conig file written to /data/Data/database/HumanGut/fna/humangut_virsorter.out/config.yaml

[2023-01-03 17:07 INFO] Executing: snakemake --snakefile /home/bioleia/anaconda3/envs/vs2/lib/python3.10/site-packages/virsorter/Snakefile --directory /data/Data/database/HumanGut/fna/humangut_virsorter.out --jobs 4 --configfile /data/Data/database/HumanGut/fna/humangut_virsorter.out/config.yaml --latency-wait 600 --rerun-incomplete --nolock --conda-frontend mamba --conda-prefix /data/Data/database/HumanGut/fna/db/conda_envs --use-conda --quiet all Job counts: count jobs 1 all 1 check_point_for_reclassify 1 circular_linear_split 1 classify 2 classify_by_group 2 classify_full_and_part_by_group 1 combine_linear_circular 2 combine_linear_circular_by_group 1 extract_feature 1 extract_provirus_seqs 1 finalize 1 gff_feature 2 gff_feature_by_group 2 hmm_features_by_group 1 hmm_sort_to_best_hit_taxon 2 hmm_sort_to_best_hit_taxon_by_group 1 merge_classification 1 merge_full_and_part_classification 2 merge_hmm_gff_features_by_group 2 merge_provirus_call_by_group_by_split 1 merge_provirus_call_from_groups 5 merge_split_hmmtbl 10 merge_split_hmmtbl_by_group 10 merge_split_hmmtbl_by_group_tmp 1 pick_viral_fullseq 1 preprocess 1 split_faa 2 split_faa_by_group 2 split_gff_by_group 61 [2023-01-03 17:37 INFO] # of seqs < 1500 bp and removed: 896070 [2023-01-03 17:37 INFO] # of circular seqs: 6797 [2023-01-03 17:37 INFO] # of linear seqs : 9353701 [2023-01-03 17:38 INFO] Finish spliting circular contig file with common rbs [2023-01-03 17:56 INFO] Finish spliting linear contig file with common rbs [Wed Jan 4 15:08:49 2023] Error in rule merge_split_faa_gff: jobid: 96 output: iter-0/pp-linear.gff, iter-0/pp-linear.faa conda-env: /data/Data/database/HumanGut/fna/db/conda_envs/b15e279e shell:

    printf "%s

" iter-0/pp-linear.fna.splitdir/pp-linear.fna.320.split.pdg.splitgff iter-0/pp-linear.fna.splitdir/pp-linear.fna.553.split.pdg.splitgff iter-0/pp-linear.fna.splitdir/pp-linear.fna.0.split.pdg.splitgff iter-0/pp-linear.fna.splitdir/pp-linear.fna.1.split.pdg.splitgff iter-0/pp-linear.fna.splitdir/pp-linear.fna.10.split.pdg.splitgff iter-0/pp-linear.fna.splitdir/pp-linear.fna.100.split.pdg.splitgff iter-0/pp-linear.fna.splitdir/pp-linear.fna.101.split.pdg.splitgff iter-0/pp-linear.fna.splitdir/pp-linear.fna.102.split.pdg.splitgff iter-0/pp-linear.fna.splitdir/pp-linear.fna.103.split.pdg.splitgff iter-0/pp-linear.fna.splitdir/pp-linear.fna.104.split.pdg.splitgff iter-0/pp-linear.fna.splitdir/pp-li..."

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Exiting because a job execution failed. Look above for error message

*** An error occurred. Detailed errors may not be printed for certain rules. Refer to the log file of the failed command for troubleshooting Issues can be raised at: https://github.com/jiarong/VirSorter2/issues

jiarong commented 1 year ago

Hi, are you running on Linux? Also make sure there is enough disk space and file number allowed.

DugauquierR commented 1 year ago

Hello, sorry for the delay of my response. Yes I'm running on Linux, I check and I have enough disk space and file number allowed also. I tried several times by changing -j but always the same error

jiarong commented 1 year ago

Not sure. Does it work for a smaller dataset? eg. the test example.

DugauquierR commented 1 year ago

Thanks for your response. I effectively launch virsorter2 with the test example and I obtain this error. I suppose that I have got a problem with a dependencies needed.

(vs2) bioleia@bioleia:/data/Lara/genetic_switch$ virsorter run -w test.out -i test.fa --min-length 1500 -j 4 all [2023-01-11 10:02 INFO] VirSorter 2.2.3 [2023-01-11 10:02 INFO] /home/bioleia/anaconda3/envs/vs2/bin/virsorter run -w test.out -i test.fa --min-length 1500 -j 4 all [2023-01-11 10:02 INFO] Using /home/bioleia/anaconda3/envs/vs2/lib/python3.10/site-packages/virsorter/template-config.yaml as config template [2023-01-11 10:02 INFO] conig file written to /data/Lara/genetic_switch/test.out/config.yaml

[2023-01-11 10:02 INFO] Executing: snakemake --snakefile /home/bioleia/anaconda3/envs/vs2/lib/python3.10/site-packages/virsorter/Snakefile --directory /data/Lara/genetic_switch/test.out --jobs 4 --configfile /data/Lara/genetic_switch/test.out/config.yaml --latency-wait 600 --rerun-incomplete --nolock --conda-frontend mamba --conda-prefix /home/bioleia/db/conda_envs --use-conda --quiet all Job counts: count jobs 1 all 1 check_point_for_reclassify 1 circular_linear_split 1 classify 2 classify_by_group 2 classify_full_and_part_by_group 1 combine_linear_circular 2 combine_linear_circular_by_group 1 extract_feature 1 extract_provirus_seqs 1 finalize 1 gff_feature 2 gff_feature_by_group 2 hmm_features_by_group 1 hmm_sort_to_best_hit_taxon 2 hmm_sort_to_best_hit_taxon_by_group 1 merge_classification 1 merge_full_and_part_classification 2 merge_hmm_gff_features_by_group 2 merge_provirus_call_by_group_by_split 1 merge_provirus_call_from_groups 5 merge_split_hmmtbl 10 merge_split_hmmtbl_by_group 10 merge_split_hmmtbl_by_group_tmp 1 pick_viral_fullseq 1 preprocess 1 split_faa 2 split_faa_by_group 2 split_gff_by_group 61 [2023-01-11 10:02 INFO] # of seqs < 1500 bp and removed: 1 [2023-01-11 10:02 INFO] # of circular seqs: 1 [2023-01-11 10:02 INFO] # of linear seqs : 7 [2023-01-11 10:02 INFO] Finish spliting circular contig file with common rbs [2023-01-11 10:02 INFO] Finish spliting linear contig file with common rbs [2023-01-11 10:02 INFO] Step 1 - preprocess finished. [2023-01-11 10:34 INFO] Step 2 - extract-feature finished. [2023-01-11 10:34 ERROR] See error details in /data/Lara/genetic_switch/test.out/log/iter-0/step3-classify/all-score-dsDNAphage.log [2023-01-11 10:34 ERROR] See error details in /data/Lara/genetic_switch/test.out/log/iter-0/step3-classify/all-score-ssDNA.log [Wed Jan 11 10:34:19 2023] [Wed Jan 11 10:34:19 2023] Error in rule classify_by_group: Error in rule classify_by_group: jobid: 56 jobid: 57 output: iter-0/dsDNAphage/all.pdg.clf output: iter-0/ssDNA/all.pdg.clf conda-env: /home/bioleia/db/conda_envs/815471a1 conda-env: /home/bioleia/db/conda_envs/815471a1 shell:

    Log=/data/Lara/genetic_switch/test.out/log/iter-0/step3-classify/all-score-dsDNAphage.log
    python /home/bioleia/anaconda3/envs/vs2/lib/python3.10/site-packages/virsorter/./scripts/classify.py iter-0/dsDNAphage/all.pdg.ftr /home/bioleia/db/group/dsDNAphage/model dsDNAphage iter-0/dsDNAphage/all.pdg.clf 2> $Log || { echo "See error details in $Log" | python /home/bioleia/anaconda3/envs/vs2/lib/python3.10/site-packages/virsorter/./scripts/echo.py --level error; exit 1; }

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
shell:

    Log=/data/Lara/genetic_switch/test.out/log/iter-0/step3-classify/all-score-ssDNA.log
    python /home/bioleia/anaconda3/envs/vs2/lib/python3.10/site-packages/virsorter/./scripts/classify.py iter-0/ssDNA/all.pdg.ftr /home/bioleia/db/group/ssDNA/model ssDNA iter-0/ssDNA/all.pdg.clf 2> $Log || { echo "See error details in $Log" | python /home/bioleia/anaconda3/envs/vs2/lib/python3.10/site-packages/virsorter/./scripts/echo.py --level error; exit 1; }

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Exiting because a job execution failed. Look above for error message

*** An error occurred. Detailed errors may not be printed for certain rules. Refer to the log file of the failed command for troubleshooting Issues can be raised at: https://github.com/jiarong/VirSorter2/issues

Here you can find the log file /all-score-dsDNAphage.log

Traceback (most recent call last): File "/home/bioleia/anaconda3/envs/vs2/lib/python3.10/site-packages/virsorter/./scripts/classify.py", line 77, in main() File "/home/bioleia/anaconda3/envs/vs2/lib/python3.10/site-packages/virsorter/./scripts/classify.py", line 60, in main model = joblib.load(model_f) File "/home/bioleia/db/conda_envs/815471a1/lib/python3.8/site-packages/joblib/numpy_pickle.py", line 658, in load obj = _unpickle(fobj, filename, mmap_mode) File "/home/bioleia/db/conda_envs/815471a1/lib/python3.8/site-packages/joblib/numpy_pickle.py", line 577, in _unpickle obj = unpickler.load() File "/home/bioleia/db/conda_envs/815471a1/lib/python3.8/pickle.py", line 1212, in load dispatchkey[0] File "/home/bioleia/db/conda_envs/815471a1/lib/python3.8/pickle.py", line 1537, in load_stack_global self.append(self.find_class(module, name)) File "/home/bioleia/db/conda_envs/815471a1/lib/python3.8/pickle.py", line 1579, in find_class import(module, level=0) File "/home/bioleia/db/conda_envs/815471a1/lib/python3.8/site-packages/sklearn/ensemble/init.py", line 7, in from ._forest import RandomForestClassifier File "/home/bioleia/db/conda_envs/815471a1/lib/python3.8/site-packages/sklearn/ensemble/_forest.py", line 56, in from ..tree import (DecisionTreeClassifier, DecisionTreeRegressor, File "/home/bioleia/db/conda_envs/815471a1/lib/python3.8/site-packages/sklearn/tree/init.py", line 6, in from ._classes import BaseDecisionTree File "/home/bioleia/db/conda_envs/815471a1/lib/python3.8/site-packages/sklearn/tree/_classes.py", line 40, in from ._criterion import Criterion File "sklearn/tree/_splitter.pxd", line 34, in init sklearn.tree._criterion File "sklearn/tree/_tree.pxd", line 37, in init sklearn.tree._splitter File "sklearn/neighbors/_quad_tree.pxd", line 55, in init sklearn.tree._tree File "/home/bioleia/db/conda_envs/815471a1/lib/python3.8/site-packages/sklearn/neighbors/init.py", line 17, in from ._nca import NeighborhoodComponentsAnalysis File "/home/bioleia/db/conda_envs/815471a1/lib/python3.8/site-packages/sklearn/neighbors/_nca.py", line 22, in from ..decomposition import PCA File "/home/bioleia/db/conda_envs/815471a1/lib/python3.8/site-packages/sklearn/decomposition/init.py", line 17, in from .dict_learning import dict_learning File "/home/bioleia/db/conda_envs/815471a1/lib/python3.8/site-packages/sklearn/decomposition/dict_learning.py", line 4, in from . import _dict_learning File "/home/bioleia/db/conda_envs/815471a1/lib/python3.8/site-packages/sklearn/decomposition/_dict_learning.py", line 21, in from ..linear_model import Lasso, orthogonal_mp_gram, LassoLars, Lars File "/home/bioleia/db/conda_envs/815471a1/lib/python3.8/site-packages/sklearn/linear_model/init.py", line 12, in from ._least_angle import (Lars, LassoLars, lars_path, lars_path_gram, LarsCV, File "/home/bioleia/db/conda_envs/815471a1/lib/python3.8/site-packages/sklearn/linear_model/_least_angle.py", line 30, in method='lar', copy_X=True, eps=np.finfo(np.float).eps, File "/home/bioleia/db/conda_envs/815471a1/lib/python3.8/site-packages/numpy/init.py", line 284, in getattr raise AttributeError("module {!r} has no attribute " AttributeError: module 'numpy' has no attribute 'float'

Thanks for your help

nahlgren commented 1 year ago

For what it's worth, I'm having the same issue. I ran the test example after installation with conda: virsorter run -w test.out -i test.fa --min-length 1500 -j 4 all

Traceback errors in the file all-score-dsDNAphage.log are nearly identical to those above from DugauquierR with the exception of course of library paths. I don't think it's an issue but I'm using miniconda2 (rather than anaconda3 in the 1st two lines of the Traceback for DugauquierR)

From some Googling, I'm wondering if this is an issue with np.float. I tried searching through the other python scripts in the package and in most other cases np.float32 or np.float64 are used. The exceptions I could find are in the scripts below, which includes _least_angle.py which seems to be causing the problem: ... /lib/python3.8/site-packages/sklearn/linear_model/_least_angle.py ... /lib/python3.8/site-packages/sklearn/manifold/_t_sne.py ... /lib/python3.8/site-packages/statsmodels/sandbox/descstats.py ... /lib/python3.8/site-packages/sklearn/utils/estimator_checks.py ... /lib/python3.8/site-packages/sklearn/preprocessing/_encoders.py ... /lib/python3.8/site-packages/sklearn/metrics/pairwise.py

I'm no coding expert so not sure if np.floating is a problem too, e.g. in ... /lib/python3.8/site-packages/sklearn/model_selection/_validation.py ... /lib/python3.8/site-packages/patsy/design_info.py ... /lib/python3.8/site-packages/patsy/util.py ... /lib/python3.8/site-packages/sklearn/utils/estimator_checks.py ... /lib/python3.8/site-packages/sklearn/utils/extmath.py ... /lib/python3.8/site-packages/statsmodels/compat/pandas.py

jiarong commented 1 year ago

@DugauquierR and @nahlgren For the numpy "float" attribute issue, see solutions in comments in issue #145. The original error from @DugauquierR's data, however, seem to be a different problem.

jiarong commented 1 year ago

@all The numpy "float" attribute issue has been fixed in v2.2.4.

XiaominWang12 commented 1 year ago

Hello, sorry for the delay of my response. Yes I'm running on Linux, I check and I have enough disk space and file number allowed also. I tried several times by changing -j but always the same error

I have met the same error, Have you fixed it ?

inspirewind commented 4 days ago

same issue, any update?