merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
426 stars 145 forks source link

Problem with anvi-summary on Master #1293

Closed tdelmont closed 4 years ago

tdelmont commented 4 years ago

Hi,

using v6.1-master here

I got this error while trying to summarize a collection of viruses (after running the classic HMMs program):

anvi-summarize -c CONTIGS.db -p PROFILE/PROFILE.db -C Genomes -o SUMMARY_hmms
Contigs DB ...................................: Initialized: CONTIGS.db (v. 14)
Auxiliary Data ...............................: Found: PROFILE/AUXILIARY-DATA.db (v. 2)
Profile Super ................................: Initialized with all 28427 splits: PROFILE/PROFILE.db (v. 31)
[10 Nov 19 17:57:04 [Processing "Acanthamoeba_castellanii_medusavirus" (1 of 995)]] Accessing completeness scores ...                                                                         Traceback (most recent call last):
  File "/Users/tomdelmonttdelmont/virtual-envs/anvio-dev/anvio/bin/anvi-summarize", line 101, in <module>
    main(args)
  File "/Users/tomdelmonttdelmont/virtual-envs/anvio-dev/anvio/bin/anvi-summarize", line 63, in main
    summary.process()
  File "/Users/tomdelmonttdelmont/virtual-envs/anvio-dev/anvio/anvio/summarizer.py", line 1187, in process
    self.summary['collection'][bin_id] = bin.create()
  File "/Users/tomdelmonttdelmont/virtual-envs/anvio-dev/anvio/anvio/summarizer.py", line 1577, in create
    self.access_completeness_scores()
  File "/Users/tomdelmonttdelmont/virtual-envs/anvio-dev/anvio/anvio/summarizer.py", line 1608, in access_completeness_scores
    p_completion, p_redundancy, domain, domain_probabilities, info_text, results_dict = self.summary.completeness.get_info_for_splits(set(self.split_names))
  File "/Users/tomdelmonttdelmont/virtual-envs/anvio-dev/anvio/anvio/completeness.py", line 389, in get_info_for_splits
    best_matching_domain, domain_probabilities, control_domains, info_text = self.get_best_matching_domain(scg_hmm_hits, observed_genes_per_domain, bin_name)
  File "/Users/tomdelmonttdelmont/virtual-envs/anvio-dev/anvio/anvio/completeness.py", line 186, in get_best_matching_domain
    domain_probabilities, actual_domains, control_domains = self.SCG_domain_predictor.predict_from_observed_genes_per_domain(observed_genes_per_domain)
  File "/Users/tomdelmonttdelmont/virtual-envs/anvio-dev/anvio/anvio/scgdomainclassifier.py", line 267, in predict_from_observed_genes_per_domain
    domain_probabilities = dict(zip(self.rf.classes, self.rf.classifier.predict_proba([features_vector])[0]))
  File "/Users/tomdelmonttdelmont/virtual-envs/anvio-dev/lib/python3.7/site-packages/sklearn/ensemble/forest.py", line 581, in predict_proba
    n_jobs, _, _ = _partition_estimators(self.n_estimators, self.n_jobs)
  File "/Users/tomdelmonttdelmont/virtual-envs/anvio-dev/lib/python3.7/site-packages/sklearn/ensemble/base.py", line 153, in _partition_estimators
    n_jobs = min(_get_n_jobs(n_jobs), n_estimators)
  File "/Users/tomdelmonttdelmont/virtual-envs/anvio-dev/lib/python3.7/site-packages/sklearn/utils/__init__.py", line 464, in _get_n_jobs
    if n_jobs < 0:
TypeError: '<' not supported between instances of 'NoneType' and 'int'

Note that other hmms have been run for this CONTIGS.db (specific targets of interest to my project). Prior to running the singlecopy collections with anvi-run-hmms (Bacteria-Archaea-Eukarya) the summary was working just fine...

Thanks for looking into this error message, and my apologies if it is a mistake on my side (would not be the first time)

Best,

Tom

meren commented 4 years ago

Thanks for sending the example data privately so I could try to recapitulate this error, Tom.

Although, after a loooooong struggle to figure out what was wrong I finally realized that actually everything was working with the right Python version, and I missed what was wrong in your logs: You seem to be using python3.7 :/

(...)
  File "/Users/tomdelmonttdelmont/virtual-envs/anvio-dev/lib/python3.7/site-packages/sklearn/ensemble/base.py", line 153, in _partition_estimators
(...)

Even though the installation document emphasizes that anvi'o wants to run on Python 3.6:

http://merenlab.org/2016/06/26/installation-v2/

I think the error has something to do with sklearn / Python 3.7.