merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
426 stars 145 forks source link

PyANI fails due to the Pandas deprecated feature #1227

Closed andrewsonin closed 4 years ago

andrewsonin commented 5 years ago

I tried to test pangenomic workflow with anvi-self-test --suite pangenomics and got an error. Based on log-file, it was appeared due to the use of deprecated static method pd.DataFrame.from_csv.

Possible solutions:

My OS is Linux Mint 19.2. I installed anvio as follows:

conda create -n anvio5 -c bioconda -c conda-forge anvio=5.5.0

My anvi-self-test --version:

Anvi'o version ...............................: margaret (v5.5)
Profile DB version ...........................: 31
Contigs DB version ...........................: 12
Pan DB version ...............................: 13
Genome data storage version ..................: 6
Auxiliary data storage version ...............: 2
Structure DB version .........................: 1

The output of cat /tmp/tmp02a1zwsp/test-output/pan_test/ANI_LOG.txt:

# DATE: 06 Sep 19 15:19:34
# CMD LINE: average_nucleotide_identity.py --outdir output --indir /tmp/tmpz_j7gsli --method ANIb --workers 1
Traceback (most recent call last):
  File "/home/sonin/.anaconda3/envs/anvio5/bin/average_nucleotide_identity.py", line 804, in <module>
    results = methods[args.method][0](infiles, org_lengths)
  File "/home/sonin/.anaconda3/envs/anvio5/bin/average_nucleotide_identity.py", line 574, in unified_anib
    fraglengths=fraglengths, mode=args.method)
  File "/home/sonin/.anaconda3/envs/anvio5/lib/python3.6/site-packages/pyani/anib.py", line 382, in process_blast
    resultvals = parse_blast_tab(blastfile, fraglengths, mode)
  File "/home/sonin/.anaconda3/envs/anvio5/lib/python3.6/site-packages/pyani/anib.py", line 431, in parse_blast_tab
    data = pd.DataFrame.from_csv(filename, header=None, sep='\t')
AttributeError: type object 'DataFrame' has no attribute 'from_csv'

The output of anvi-self-test --suite pangenomics:

:: Output directory ...

/tmp/tmp02a1zwsp/test-output

:: Anvo'o version ...

Anvi'o version ...............................: margaret (v5.5)
Profile DB version ...........................: 31
Contigs DB version ...........................: 12
Pan DB version ...............................: 13
Genome data storage version ..................: 6
Auxiliary data storage version ...............: 2
Structure DB version .........................: 1

:: Setting up the pan analysis directory ...

:: Generating contigs databases for external genomes ...

:: INPUT DIR: /tmp/tmp02a1zwsp/test-output/pan_test, FNAME: 01 ...

:: RENAMING CONTIGS ...

Input ........................................: /tmp/tmp02a1zwsp/test-output/pan_test/01.fa
Output .......................................: /tmp/tmp02a1zwsp/test-output/pan_test/01-clean.fa
Minimum length ...............................: 0
Total num contigs ............................: 1
Total num nucleotides ........................: 139,930
Contigs removed ..............................: 0 (0.00% of all)
Nucleotides removed ..........................: 0 (0.00% of all)
Deflines simplified ..........................: True

:: GENERATING THE CONTIGS DB ...

Input FASTA file .............................: /tmp/tmp02a1zwsp/test-output/pan_test/01-clean.fa
Name .........................................: 01
Description ..................................: No description is given
Split Length .................................: 20,000                                                                                                                                                                                        
K-mer size ...................................: 4
Skip gene calling? ...........................: False
External gene calls provided? ................: None
Ignoring internal stop codons? ...............: False
Splitting pays attention to gene calls? ......: False

Finding ORFs in contigs
===============================================
Genes ........................................: /tmp/tmpaognu7p1/contigs.genes
Amino acid sequences .........................: /tmp/tmpaognu7p1/contigs.amino_acid_sequences
Log file .....................................: /tmp/tmpaognu7p1/00_log.txt
Result .......................................: Prodigal (v2.6.3) has identified 119 genes.                                                                                                                                                   

Contigs with at least one gene call ..........: 1 of 1 (100.0%)                                                                                                                                                                               
Contigs database .............................: A new database, /tmp/tmp02a1zwsp/test-output/pan_test/01.db, has been created.
Number of contigs ............................: 1
Number of splits .............................: 7
Total number of nucleotides ..................: 139,930
Gene calling step skipped ....................: False
Splits broke genes (non-mindful mode) ........: True
Desired split length (what the user wanted) ..: 20,000
Average split length (wnat anvi'o gave back) .: 19,990

:: RUNNING HMMs ...

Target found .................................: AA:GENE                                                                                                                                                                                       
Contigs DB ...................................: Initialized: /tmp/tmp02a1zwsp/test-output/pan_test/01.db (v. 12)                                                                                                                              

WARNING
===============================================
You did not provide any gene caller ids. As a result, anvi'o will give you back
sequences for every 119 gene call stored in the contigs database.

Output .......................................: /tmp/tmpu_0i4g8i/AA_gene_sequences.fa                                                                                                                                                         
Target found .................................: RNA:CONTIG
Contigs DB ...................................: Initialized: /tmp/tmp02a1zwsp/test-output/pan_test/01.db (v. 12)                                                                                                                              

HMM Profiling for Ribosomal_RNAs
===============================================
Reference ....................................: Seemann T, https://github.com/tseemann/barrnap
Kind .........................................: Ribosomal_RNAs
Alphabet .....................................: RNA
Context ......................................: CONTIG
Domain .......................................: N\A
HMM model path ...............................: /home/sonin/.anaconda3/envs/anvio5/lib/python3.6/site-packages/anvio/data/hmm/Ribosomal_RNAs/genes.hmm.gz
Number of genes ..............................: 12
Noise cutoff term(s) .........................: --cut_ga
Number of CPUs will be used for search .......: 1
Temporary work dir ...........................: /tmp/tmpds6fx32c
HMM scan output ..............................: /tmp/tmpds6fx32c/hmm.output
HMM scan hits ................................: /tmp/tmpds6fx32c/hmm.hits
Log file .....................................: /tmp/tmpds6fx32c/00_log.txt
Number of raw hits ...........................: 0                                                                                                                                                                                             

* The HMM source 'Ribosomal_RNAs' returned 0 hits. SAD (but it's stil OK).

HMM Profiling for Rinke_et_al
===============================================
Reference ....................................: Rinke et al, http://www.nature.com/nature/journal/v499/n7459/full/nature12352.html
Kind .........................................: singlecopy
Alphabet .....................................: AA
Context ......................................: GENE
Domain .......................................: archaea
HMM model path ...............................: /home/sonin/.anaconda3/envs/anvio5/lib/python3.6/site-packages/anvio/data/hmm/Rinke_et_al/genes.hmm.gz
Number of genes ..............................: 162
Noise cutoff term(s) .........................: --cut_ga
Number of CPUs will be used for search .......: 1
Temporary work dir ...........................: /tmp/tmpqgakp09a
HMM scan output ..............................: /tmp/tmpqgakp09a/hmm.output
HMM scan hits ................................: /tmp/tmpqgakp09a/hmm.hits
Log file .....................................: /tmp/tmpqgakp09a/00_log.txt
Number of raw hits ...........................: 0                                                                                                                                                                                             

* The HMM source 'Rinke_et_al' returned 0 hits. SAD (but it's stil OK).

HMM Profiling for Campbell_et_al
===============================================
Reference ....................................: Campbell et al, http://www.pnas.org/content/110/14/5540.short
Kind .........................................: singlecopy
Alphabet .....................................: AA
Context ......................................: GENE
Domain .......................................: bacteria
HMM model path ...............................: /home/sonin/.anaconda3/envs/anvio5/lib/python3.6/site-packages/anvio/data/hmm/Campbell_et_al/genes.hmm.gz
Number of genes ..............................: 139
Noise cutoff term(s) .........................: --cut_ga
Number of CPUs will be used for search .......: 1
Temporary work dir ...........................: /tmp/tmpu2hneyd0
HMM scan output ..............................: /tmp/tmpu2hneyd0/hmm.output
HMM scan hits ................................: /tmp/tmpu2hneyd0/hmm.hits
Log file .....................................: /tmp/tmpu2hneyd0/00_log.txt
Number of raw hits ...........................: 6                                                                                                                                                                                             

HMM Profiling for BUSCO_83_Protista
===============================================
Reference ....................................: See Simao et al, doi:10.1093/bioinformatics/btv351, as well as http://merenlab.org/delmont-euk-scgs
Kind .........................................: singlecopy
Alphabet .....................................: AA
Context ......................................: GENE
Domain .......................................: eukarya
HMM model path ...............................: /home/sonin/.anaconda3/envs/anvio5/lib/python3.6/site-packages/anvio/data/hmm/BUSCO_83_Protista/genes.hmm.gz
Number of genes ..............................: 83
Noise cutoff term(s) .........................: -E 1e-25
Number of CPUs will be used for search .......: 1
Temporary work dir ...........................: /tmp/tmpl8g03ajx
HMM scan output ..............................: /tmp/tmpl8g03ajx/hmm.output
HMM scan hits ................................: /tmp/tmpl8g03ajx/hmm.hits
Log file .....................................: /tmp/tmpl8g03ajx/00_log.txt
Number of raw hits ...........................: 0                                                                                                                                                                                             

* The HMM source 'BUSCO_83_Protista' returned 0 hits. SAD (but it's stil OK).

:: INPUT DIR: /tmp/tmp02a1zwsp/test-output/pan_test, FNAME: 02 ...

:: RENAMING CONTIGS ...

Input ........................................: /tmp/tmp02a1zwsp/test-output/pan_test/02.fa
Output .......................................: /tmp/tmp02a1zwsp/test-output/pan_test/02-clean.fa
Minimum length ...............................: 0
Total num contigs ............................: 1
Total num nucleotides ........................: 139,930
Contigs removed ..............................: 0 (0.00% of all)
Nucleotides removed ..........................: 0 (0.00% of all)
Deflines simplified ..........................: True

:: GENERATING THE CONTIGS DB ...

Input FASTA file .............................: /tmp/tmp02a1zwsp/test-output/pan_test/02-clean.fa
Name .........................................: 02
Description ..................................: No description is given
Split Length .................................: 20,000                                                                                                                                                                                        
K-mer size ...................................: 4
Skip gene calling? ...........................: False
External gene calls provided? ................: None
Ignoring internal stop codons? ...............: False
Splitting pays attention to gene calls? ......: False

Finding ORFs in contigs
===============================================
Genes ........................................: /tmp/tmp5nqs9hin/contigs.genes
Amino acid sequences .........................: /tmp/tmp5nqs9hin/contigs.amino_acid_sequences
Log file .....................................: /tmp/tmp5nqs9hin/00_log.txt
Result .......................................: Prodigal (v2.6.3) has identified 119 genes.                                                                                                                                                   

Contigs with at least one gene call ..........: 1 of 1 (100.0%)                                                                                                                                                                               
Contigs database .............................: A new database, /tmp/tmp02a1zwsp/test-output/pan_test/02.db, has been created.
Number of contigs ............................: 1
Number of splits .............................: 7
Total number of nucleotides ..................: 139,930
Gene calling step skipped ....................: False
Splits broke genes (non-mindful mode) ........: True
Desired split length (what the user wanted) ..: 20,000
Average split length (wnat anvi'o gave back) .: 19,990

:: RUNNING HMMs ...

Target found .................................: RNA:CONTIG                                                                                                                                                                                    
Contigs DB ...................................: Initialized: /tmp/tmp02a1zwsp/test-output/pan_test/02.db (v. 12)                                                                                                                              
Target found .................................: AA:GENE
Contigs DB ...................................: Initialized: /tmp/tmp02a1zwsp/test-output/pan_test/02.db (v. 12)                                                                                                                              

WARNING
===============================================
You did not provide any gene caller ids. As a result, anvi'o will give you back
sequences for every 119 gene call stored in the contigs database.

Output .......................................: /tmp/tmp162qmeo8/AA_gene_sequences.fa                                                                                                                                                         

HMM Profiling for Ribosomal_RNAs
===============================================
Reference ....................................: Seemann T, https://github.com/tseemann/barrnap
Kind .........................................: Ribosomal_RNAs
Alphabet .....................................: RNA
Context ......................................: CONTIG
Domain .......................................: N\A
HMM model path ...............................: /home/sonin/.anaconda3/envs/anvio5/lib/python3.6/site-packages/anvio/data/hmm/Ribosomal_RNAs/genes.hmm.gz
Number of genes ..............................: 12
Noise cutoff term(s) .........................: --cut_ga
Number of CPUs will be used for search .......: 1
Temporary work dir ...........................: /tmp/tmpg9e5skmk
HMM scan output ..............................: /tmp/tmpg9e5skmk/hmm.output
HMM scan hits ................................: /tmp/tmpg9e5skmk/hmm.hits
Log file .....................................: /tmp/tmpg9e5skmk/00_log.txt
Number of raw hits ...........................: 0                                                                                                                                                                                             

* The HMM source 'Ribosomal_RNAs' returned 0 hits. SAD (but it's stil OK).

HMM Profiling for Rinke_et_al
===============================================
Reference ....................................: Rinke et al, http://www.nature.com/nature/journal/v499/n7459/full/nature12352.html
Kind .........................................: singlecopy
Alphabet .....................................: AA
Context ......................................: GENE
Domain .......................................: archaea
HMM model path ...............................: /home/sonin/.anaconda3/envs/anvio5/lib/python3.6/site-packages/anvio/data/hmm/Rinke_et_al/genes.hmm.gz
Number of genes ..............................: 162
Noise cutoff term(s) .........................: --cut_ga
Number of CPUs will be used for search .......: 1
Temporary work dir ...........................: /tmp/tmpwno15se5
HMM scan output ..............................: /tmp/tmpwno15se5/hmm.output
HMM scan hits ................................: /tmp/tmpwno15se5/hmm.hits
Log file .....................................: /tmp/tmpwno15se5/00_log.txt
Number of raw hits ...........................: 0                                                                                                                                                                                             

* The HMM source 'Rinke_et_al' returned 0 hits. SAD (but it's stil OK).

HMM Profiling for Campbell_et_al
===============================================
Reference ....................................: Campbell et al, http://www.pnas.org/content/110/14/5540.short
Kind .........................................: singlecopy
Alphabet .....................................: AA
Context ......................................: GENE
Domain .......................................: bacteria
HMM model path ...............................: /home/sonin/.anaconda3/envs/anvio5/lib/python3.6/site-packages/anvio/data/hmm/Campbell_et_al/genes.hmm.gz
Number of genes ..............................: 139
Noise cutoff term(s) .........................: --cut_ga
Number of CPUs will be used for search .......: 1
Temporary work dir ...........................: /tmp/tmp3sv8zz_h
HMM scan output ..............................: /tmp/tmp3sv8zz_h/hmm.output
HMM scan hits ................................: /tmp/tmp3sv8zz_h/hmm.hits
Log file .....................................: /tmp/tmp3sv8zz_h/00_log.txt
Number of raw hits ...........................: 6                                                                                                                                                                                             

HMM Profiling for BUSCO_83_Protista
===============================================
Reference ....................................: See Simao et al, doi:10.1093/bioinformatics/btv351, as well as http://merenlab.org/delmont-euk-scgs
Kind .........................................: singlecopy
Alphabet .....................................: AA
Context ......................................: GENE
Domain .......................................: eukarya
HMM model path ...............................: /home/sonin/.anaconda3/envs/anvio5/lib/python3.6/site-packages/anvio/data/hmm/BUSCO_83_Protista/genes.hmm.gz
Number of genes ..............................: 83
Noise cutoff term(s) .........................: -E 1e-25
Number of CPUs will be used for search .......: 1
Temporary work dir ...........................: /tmp/tmpgp4qf7ne
HMM scan output ..............................: /tmp/tmpgp4qf7ne/hmm.output
HMM scan hits ................................: /tmp/tmpgp4qf7ne/hmm.hits
Log file .....................................: /tmp/tmpgp4qf7ne/00_log.txt
Number of raw hits ...........................: 0                                                                                                                                                                                             

* The HMM source 'BUSCO_83_Protista' returned 0 hits. SAD (but it's stil OK).

:: INPUT DIR: /tmp/tmp02a1zwsp/test-output/pan_test, FNAME: 03 ...

:: RENAMING CONTIGS ...

Input ........................................: /tmp/tmp02a1zwsp/test-output/pan_test/03.fa
Output .......................................: /tmp/tmp02a1zwsp/test-output/pan_test/03-clean.fa
Minimum length ...............................: 0
Total num contigs ............................: 1
Total num nucleotides ........................: 139,930
Contigs removed ..............................: 0 (0.00% of all)
Nucleotides removed ..........................: 0 (0.00% of all)
Deflines simplified ..........................: True

:: GENERATING THE CONTIGS DB ...

Input FASTA file .............................: /tmp/tmp02a1zwsp/test-output/pan_test/03-clean.fa
Name .........................................: 03
Description ..................................: No description is given
Split Length .................................: 20,000                                                                                                                                                                                        
K-mer size ...................................: 4
Skip gene calling? ...........................: False
External gene calls provided? ................: None
Ignoring internal stop codons? ...............: False
Splitting pays attention to gene calls? ......: False

Finding ORFs in contigs
===============================================
Genes ........................................: /tmp/tmphsyesylc/contigs.genes
Amino acid sequences .........................: /tmp/tmphsyesylc/contigs.amino_acid_sequences
Log file .....................................: /tmp/tmphsyesylc/00_log.txt
Result .......................................: Prodigal (v2.6.3) has identified 117 genes.                                                                                                                                                   

Contigs with at least one gene call ..........: 1 of 1 (100.0%)                                                                                                                                                                               
Contigs database .............................: A new database, /tmp/tmp02a1zwsp/test-output/pan_test/03.db, has been created.
Number of contigs ............................: 1
Number of splits .............................: 7
Total number of nucleotides ..................: 139,930
Gene calling step skipped ....................: False
Splits broke genes (non-mindful mode) ........: True
Desired split length (what the user wanted) ..: 20,000
Average split length (wnat anvi'o gave back) .: 19,990

:: RUNNING HMMs ...

Target found .................................: AA:GENE                                                                                                                                                                                       
Contigs DB ...................................: Initialized: /tmp/tmp02a1zwsp/test-output/pan_test/03.db (v. 12)                                                                                                                              

WARNING
===============================================
You did not provide any gene caller ids. As a result, anvi'o will give you back
sequences for every 117 gene call stored in the contigs database.

Output .......................................: /tmp/tmp3o7lgl4l/AA_gene_sequences.fa                                                                                                                                                         
Target found .................................: RNA:CONTIG
Contigs DB ...................................: Initialized: /tmp/tmp02a1zwsp/test-output/pan_test/03.db (v. 12)                                                                                                                              

HMM Profiling for Ribosomal_RNAs
===============================================
Reference ....................................: Seemann T, https://github.com/tseemann/barrnap
Kind .........................................: Ribosomal_RNAs
Alphabet .....................................: RNA
Context ......................................: CONTIG
Domain .......................................: N\A
HMM model path ...............................: /home/sonin/.anaconda3/envs/anvio5/lib/python3.6/site-packages/anvio/data/hmm/Ribosomal_RNAs/genes.hmm.gz
Number of genes ..............................: 12
Noise cutoff term(s) .........................: --cut_ga
Number of CPUs will be used for search .......: 1
Temporary work dir ...........................: /tmp/tmpgv_givm1
HMM scan output ..............................: /tmp/tmpgv_givm1/hmm.output
HMM scan hits ................................: /tmp/tmpgv_givm1/hmm.hits
Log file .....................................: /tmp/tmpgv_givm1/00_log.txt
Number of raw hits ...........................: 0                                                                                                                                                                                             

* The HMM source 'Ribosomal_RNAs' returned 0 hits. SAD (but it's stil OK).

HMM Profiling for Rinke_et_al
===============================================
Reference ....................................: Rinke et al, http://www.nature.com/nature/journal/v499/n7459/full/nature12352.html
Kind .........................................: singlecopy
Alphabet .....................................: AA
Context ......................................: GENE
Domain .......................................: archaea
HMM model path ...............................: /home/sonin/.anaconda3/envs/anvio5/lib/python3.6/site-packages/anvio/data/hmm/Rinke_et_al/genes.hmm.gz
Number of genes ..............................: 162
Noise cutoff term(s) .........................: --cut_ga
Number of CPUs will be used for search .......: 1
Temporary work dir ...........................: /tmp/tmp7tcserlk
HMM scan output ..............................: /tmp/tmp7tcserlk/hmm.output
HMM scan hits ................................: /tmp/tmp7tcserlk/hmm.hits
Log file .....................................: /tmp/tmp7tcserlk/00_log.txt
Number of raw hits ...........................: 0                                                                                                                                                                                             

* The HMM source 'Rinke_et_al' returned 0 hits. SAD (but it's stil OK).

HMM Profiling for Campbell_et_al
===============================================
Reference ....................................: Campbell et al, http://www.pnas.org/content/110/14/5540.short
Kind .........................................: singlecopy
Alphabet .....................................: AA
Context ......................................: GENE
Domain .......................................: bacteria
HMM model path ...............................: /home/sonin/.anaconda3/envs/anvio5/lib/python3.6/site-packages/anvio/data/hmm/Campbell_et_al/genes.hmm.gz
Number of genes ..............................: 139
Noise cutoff term(s) .........................: --cut_ga
Number of CPUs will be used for search .......: 1
Temporary work dir ...........................: /tmp/tmp0du2p3wj
HMM scan output ..............................: /tmp/tmp0du2p3wj/hmm.output
HMM scan hits ................................: /tmp/tmp0du2p3wj/hmm.hits
Log file .....................................: /tmp/tmp0du2p3wj/00_log.txt
Number of raw hits ...........................: 6                                                                                                                                                                                             

HMM Profiling for BUSCO_83_Protista
===============================================
Reference ....................................: See Simao et al, doi:10.1093/bioinformatics/btv351, as well as http://merenlab.org/delmont-euk-scgs
Kind .........................................: singlecopy
Alphabet .....................................: AA
Context ......................................: GENE
Domain .......................................: eukarya
HMM model path ...............................: /home/sonin/.anaconda3/envs/anvio5/lib/python3.6/site-packages/anvio/data/hmm/BUSCO_83_Protista/genes.hmm.gz
Number of genes ..............................: 83
Noise cutoff term(s) .........................: -E 1e-25
Number of CPUs will be used for search .......: 1
Temporary work dir ...........................: /tmp/tmpo_itpb42
HMM scan output ..............................: /tmp/tmpo_itpb42/hmm.output
HMM scan hits ................................: /tmp/tmpo_itpb42/hmm.hits
Log file .....................................: /tmp/tmpo_itpb42/00_log.txt
Number of raw hits ...........................: 0                                                                                                                                                                                             

* The HMM source 'BUSCO_83_Protista' returned 0 hits. SAD (but it's stil OK).

:: Importing functions into the contigs database ...

Gene functions ...............................: 494 function calls from 5 sources for 115 unique gene calls has been added to the contigs database.
Gene functions ...............................: 494 function calls from 5 sources for 115 unique gene calls has been added to the contigs database.
Gene functions ...............................: 484 function calls from 5 sources for 113 unique gene calls has been added to the contigs database.

:: Generating an anvi'o genomes storage ...

WARNING
===============================================
Good news! Anvi'o found all these functions that are common to all of your
genomes and will use them for downstream analyses and is very proud of you:
'GO_TERMS, EGGNOG_BACT, COG_FUNCTION, KEGG_PATHWAYS, COG_CATEGORY'.

Internal genomes .............................: 0 have been initialized.                                                                                                                                                                      
External genomes .............................: 3 found.                                                                                                                                                                                      

* g01 is stored with 119 genes (0 of which were partial)
* g02 is stored with 119 genes (0 of which were partial)                                                                                                                                                                                      
* g03 is stored with 117 genes (1 of which were partial)                                                                                                                                                                                      

The new genomes storage ......................: TEST-GENOMES.db (v6, signature: hashac7aa5ee)
Number of genomes ............................: 3 (internal: 0, external: 3)
Number of gene calls .........................: 355
Number of partial gene calls .................: 1

:: Running the pangenome anaysis with default parameters ...

WARNING
===============================================
If you publish results from this workflow, please do not forget to cite DIAMOND
(doi:10.1038/nmeth.3176), unless you use it with --use-ncbi-blast flag, and MCL
(http://micans.org/mcl/ and doi:10.1007/978-1-61779-361-5_15)

Genomes storage ..............................: Initialized (storage hash: hashac7aa5ee)                                                                                                                                                      
Num genomes in storage .......................: 3
Num genomes will be used .....................: 3
Pan database .................................: A new database, /tmp/tmp02a1zwsp/test-output/pan_test/TEST/TEST-PAN.db, has been created.
Exclude partial gene calls ...................: False

AA sequences FASTA ...........................: /tmp/tmp02a1zwsp/test-output/pan_test/TEST/combined-aas.fa                                                                                                                                    

Num AA sequences reported ....................: 355
Num excluded gene calls ......................: 0
Unique AA sequences FASTA ....................: /tmp/tmp02a1zwsp/test-output/pan_test/TEST/combined-aas.fa.unique                                                                                                                             

WARNING
===============================================
You elected to use NCBI's blastp for amino acid sequence search. Running blastp
will be significantly slower than DIAMOND (although, anvi'o developers are
convinced that you *are* doing the right thing, so, kudos to you).

BLAST search db ..............................: /tmp/tmp02a1zwsp/test-output/pan_test/TEST/combined-aas.fa.unique                                                                                                                             
BLAST results ................................: /tmp/tmp02a1zwsp/test-output/pan_test/TEST/blast-search-results.txt                                                                                                                           
Min percent identity .........................: 0.0                                                                                                                                                                                           
Minbit .......................................: 0.5
Filtered search results ......................: 1,053 edges stored                                                                                                                                                                            
MCL input ....................................: /tmp/tmp02a1zwsp/test-output/pan_test/TEST/mcl-input.txt
MCL inflation ................................: 2.0
MCL output ...................................: /tmp/tmp02a1zwsp/test-output/pan_test/TEST/mcl-clusters.txt                                                                                                                                   
Number of gene clusters ......................: 121

CITATION
===============================================
Anvi'o will use 'muscle' by Edgar, doi:10.1093/nar/gkh340
(http://www.drive5.com/muscle) to align your sequences. If you publish your
findings, please do not forget to properly credit their work.

New data for 'items' in data group 'default'
===============================================
Data key "num_genomes_gene_cluster_has_hits" .: Predicted type: int
Data key "num_genes_in_gene_cluster" .........: Predicted type: int
Data key "max_num_paralogs" ..................: Predicted type: int
Data key "SCG" ...............................: Predicted type: int

WARNING
===============================================
You (or the programmer) asked anvi'o to NOT check the consistency of the names
of your items between your additional data and the pan database you are
attempting to update. So be it. Anvi'o will not check anything, but if things
don't look the way you expected them to look, you will not blame anvi'o for your
poorly prepared data, but choose between yourself or Obama.

NEW DATA
===============================================
Database .....................................: pan
Data group ...................................: default
Data table ...................................: items
New data keys ................................: num_genomes_gene_cluster_has_hits, num_genes_in_gene_cluster, max_num_paralogs, SCG.

gene clusters info ...........................: 121 gene_clusters stored in the database                                                                                                                                                      
New items order ..............................: "presence-absence:euclidean:ward" (type newick) has been added to the database...                                                                                                             

WARNING
===============================================
Clustering for "presence-absence:euclidean:ward" is already in the database. It
will be replaced with the new content.

New items order ..............................: "presence-absence:euclidean:ward" (type newick) has been added to the database...
New items order ..............................: "frequency:euclidean:ward" (type newick) has been added to the database...                                                                                                                    
New items order ..............................: "Forced synteny <> g01:NA:NA" (type basic) has been added to the database...
New items order ..............................: "Forced synteny <> g02:NA:NA" (type basic) has been added to the database...
New items order ..............................: "Forced synteny <> g03:NA:NA" (type basic) has been added to the database...

New layer_orders data...
===============================================
Data key "gene_cluster presence absence" .....: Type: newick
Data key "gene_cluster frequencies" ..........: Type: newick

New order data added to the db for layer_orders : gene_cluster presence absence, gene_cluster frequencies.

New data for 'layers' in data group 'default'
===============================================
Data key "total_length" ......................: Predicted type: int
Data key "gc_content" ........................: Predicted type: float
Data key "percent_completion" ................: Predicted type: int
Data key "percent_redundancy" ................: Predicted type: int
Data key "num_genes" .........................: Predicted type: int
Data key "avg_gene_length" ...................: Predicted type: float
Data key "num_genes_per_kb" ..................: Predicted type: float
Data key "singleton_gene_clusters" ...........: Predicted type: int
Data key "num_gene_clusters" .................: Predicted type: int

NEW DATA
===============================================
Database .....................................: pan
Data group ...................................: default
Data table ...................................: layers
New data keys ................................: total_length, gc_content, percent_completion, percent_redundancy, num_genes, avg_gene_length, num_genes_per_kb, singleton_gene_clusters, num_gene_clusters.

Genomes storage .............................................: Initialized (storage hash: hashac7aa5ee)                                                                                                                                       
Num genomes in storage ......................................: 3
Num genomes will be used ....................................: 3
Pan DB ......................................................: Initialized: /tmp/tmp02a1zwsp/test-output/pan_test/TEST/TEST-PAN.db (v. 13)
Gene cluster homogeneity estimates ..........................: Functional: [NO]; Geometric: [NO]; Combined: [NO]

* Gene clusters are initialized for all 121 gene clusters in the database.

New data for 'items' in data group 'default'
===============================================
Data key "functional_homogeneity_index" ......: Predicted type: float
Data key "geometric_homogeneity_index" .......: Predicted type: float
Data key "combined_homogeneity_index" ........: Predicted type: float

WARNING
===============================================
You (or the programmer) asked anvi'o to NOT check the consistency of the names
of your items between your additional data and the pan database you are
attempting to update. So be it. Anvi'o will not check anything, but if things
don't look the way you expected them to look, you will not blame anvi'o for your
poorly prepared data, but choose between yourself or Obama.

NEW DATA
===============================================
Database .....................................: pan
Data group ...................................: default
Data table ...................................: items
New data keys ................................: functional_homogeneity_index, geometric_homogeneity_index, combined_homogeneity_index.

log file ....................................................: /tmp/tmp02a1zwsp/test-output/pan_test/TEST/log.txt

:: Running ANI on genomes and storing results in the PAN database ...

CITATION
===============================================
Anvi'o will use 'PyANI' by Pritchard et al. (DOI: 10.1039/C5AY02550H) to compute
ANI. If you publish your findings, please do not forget to properly credit their
work.

[PyANI] Num threads to use ...................: 1
[PyANI] Alignment method .....................: ANIb
[PyANI] Log file path ........................: /tmp/tmp02a1zwsp/test-output/pan_test/ANI_LOG.txt

Genomes found ................................: 3
Temporary FASTA output directory .............: /tmp/tmpz_j7gsli
Output directory .............................: /tmp/tmp02a1zwsp/test-output/pan_test/ANI_TEST

Config Error: PyANI returned with non-zero exit code, there may be some errors. please check
              the log file for details.                                                     

Config Error: According to the exit code ('255'), anvi'o suspects that something may have gone
              wrong while running your tests :/ We hope that the reason is clear to you from  
              the lines above. But if you don't see anything obvious, and especially if the   
              test ended up running until the end with reasonable looking final results, you  
              shouldn't worry too much about this error. Life is short and we all can worry   
              just a bit less.                                                                

The output of conda list:

# packages in environment at /home/sonin/.anaconda3/envs/anvio5:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
anvio                     5.5.0                         0    bioconda
anvio-minimal             5.5.0            py36hf3f1cc3_0    bioconda
appdirs                   1.4.3                      py_1    conda-forge
asn1crypto                0.24.0                py36_1003    conda-forge
atk                       2.32.0               haf93ef1_0    conda-forge
attrs                     19.1.0                     py_0    conda-forge
bcftools                  1.9                  ha228f0b_3    bioconda
biopython                 1.74             py36h516909a_0    conda-forge
blas                      2.11                   openblas    conda-forge
blast                     2.5.0                hc0b0e79_3    bioconda
blast-legacy              2.2.26                        2    bioconda
boost                     1.70.0           py36h9de70de_1    conda-forge
boost-cpp                 1.70.0               ha2d47e9_1    conda-forge
bottle                    0.12.13                    py_1    conda-forge
bowtie2                   2.3.5            py36he860b03_0    bioconda
bwa                       0.7.17               hed695b0_6    bioconda
bzip2                     1.0.8                h516909a_0    conda-forge
ca-certificates           2019.6.16            hecc5488_0    conda-forge
cairo                     1.16.0            h18b612c_1001    conda-forge
centrifuge                1.0.4_beta      py36pl526he941832_2    bioconda
certifi                   2019.6.16                py36_1    conda-forge
cffi                      1.12.3           py36h8022711_0    conda-forge
chardet                   3.0.4                 py36_1003    conda-forge
cherrypy                  8.0.0                    py36_0    conda-forge
colored                   1.3.93                     py_0    conda-forge
configargparse            0.13.0                     py_1    conda-forge
cryptography              2.5              py36hb7f436b_1    conda-forge
curl                      7.64.0               h646f8bb_0    conda-forge
cycler                    0.10.0                     py_1    conda-forge
cython                    0.29.13          py36he1b5a44_0    conda-forge
datrie                    0.8              py36h516909a_0    conda-forge
dbus                      1.13.6               he372182_0    conda-forge
diamond                   0.9.25               hfb76ee0_0    bioconda
django                    2.0.8                    py36_0    conda-forge
docutils                  0.15.2                   py36_0    conda-forge
ete3                      3.1.1                    py36_0    bioconda
expat                     2.2.5             he1b5a44_1003    conda-forge
fontconfig                2.13.1            he4413a7_1000    conda-forge
freetype                  2.10.0               he983fc9_1    conda-forge
fribidi                   1.0.5             h516909a_1002    conda-forge
gdk-pixbuf                2.32.2                        1    bioconda
gettext                   0.19.8.1          hc5be6a0_1002    conda-forge
glib                      2.58.3            h6f030ca_1002    conda-forge
gobject-introspection     1.58.2          py36h5503ade_1002    conda-forge
graphite2                 1.3.13            hf484d3e_1000    conda-forge
gsl                       2.4               h294904e_1006    conda-forge
gstreamer                 1.14.5               h36ae1b5_0    conda-forge
gtk2                      2.24.32              h90f3771_0    conda-forge
h5py                      2.9.0           nompi_py36h513d04c_1104    conda-forge
harfbuzz                  2.4.0                h37c48d4_1    conda-forge
hdf5                      1.10.5          nompi_h3c11f04_1103    conda-forge
hmmer                     3.2.1                hf484d3e_1    bioconda
htslib                    1.9                  ha228f0b_7    bioconda
icu                       58.2              hf484d3e_1000    conda-forge
idna                      2.7                   py36_1002    conda-forge
illumina-utils            2.6                        py_0    bioconda
iqtree                    1.6.12               he513fc3_0    bioconda
jpeg                      9c                h14c3975_1001    conda-forge
jsonschema                3.0.2                    py36_0    conda-forge
kiwisolver                1.1.0            py36hc9558a2_0    conda-forge
krb5                      1.16.3            hc83ff2d_1000    conda-forge
libblas                   3.8.0               11_openblas    conda-forge
libcblas                  3.8.0               11_openblas    conda-forge
libcurl                   7.64.0               h01ee5af_0    conda-forge
libdeflate                1.0                  h14c3975_1    bioconda
libedit                   3.1.20170329      hf8c457e_1001    conda-forge
libffi                    3.2.1             he1b5a44_1006    conda-forge
libgcc                    7.2.0                h69d50b8_2    conda-forge
libgcc-ng                 9.1.0                hdf63c60_0  
libgfortran-ng            7.3.0                hdf63c60_0  
libiconv                  1.15              h516909a_1005    conda-forge
liblapack                 3.8.0               11_openblas    conda-forge
liblapacke                3.8.0               11_openblas    conda-forge
libopenblas               0.3.6                h6e990d7_6    conda-forge
libpng                    1.6.37               hed695b0_0    conda-forge
libssh2                   1.8.0             h1ad7b7a_1003    conda-forge
libstdcxx-ng              9.1.0                hdf63c60_0  
libtiff                   4.0.10            h57b8799_1003    conda-forge
libuuid                   2.32.1            h14c3975_1000    conda-forge
libxcb                    1.13              h14c3975_1002    conda-forge
libxml2                   2.9.9                h13577e0_2    conda-forge
libxslt                   1.1.32            hae48121_1003    conda-forge
lxml                      4.4.1            py36h7ec2d77_0    conda-forge
lz4-c                     1.8.3             he1b5a44_1001    conda-forge
matplotlib                2.2.3            py36h8a2030e_1    conda-forge
matplotlib-base           2.2.3            py36h60b886d_1    conda-forge
mcl                       14.137          pl526h470a237_4    bioconda
megahit                   1.2.8                h8b12597_0    bioconda
mistune                   0.8.1            py36h3d5977c_0  
mummer                    3.23                    pl526_8    bioconda
muscle                    3.8.1551             h6bb024c_4    bioconda
ncurses                   6.1               hf484d3e_1002    conda-forge
nose                      1.3.7                 py36_1002    conda-forge
numpy                     1.17.1           py36h95a1406_0    conda-forge
openblas                  0.3.6                h6e990d7_6    conda-forge
openssl                   1.0.2r               h14c3975_0    conda-forge
pandas                    0.25.1           py36hb3f55d8_0    conda-forge
pango                     1.42.4               ha030887_1    conda-forge
patsy                     0.5.1                      py_0    conda-forge
pcre                      8.41              hf484d3e_1003    conda-forge
perl                      5.26.2            h516909a_1006    conda-forge
pip                       19.2.3                   py36_0    conda-forge
pixman                    0.38.0            h516909a_1003    conda-forge
prodigal                  2.6.3                         1    bioconda
psutil                    5.4.3                    py36_0    conda-forge
pthread-stubs             0.4               h14c3975_1001    conda-forge
pyani                     0.2.7            py36h24bf2e0_1    bioconda
pycparser                 2.19                     py36_1    conda-forge
pyopenssl                 19.0.0                   py36_0    conda-forge
pyparsing                 2.4.2                      py_0    conda-forge
pyqt                      5.6.0           py36h13b7fb3_1008    conda-forge
pyrsistent                0.15.4           py36h516909a_0    conda-forge
pysam                     0.15.2           py36hb06f55c_2    bioconda
pysocks                   1.7.0                    py36_0    conda-forge
python                    3.6.7             hd21baee_1002    conda-forge
python-dateutil           2.8.0                      py_0    conda-forge
python-levenshtein        0.12.0                   pypi_0    pypi
pytz                      2019.2                     py_0    conda-forge
pyyaml                    5.1.2            py36h516909a_0    conda-forge
qt                        5.6.2             hce4f676_1013    conda-forge
ratelimiter               1.2.0                 py36_1000    conda-forge
readline                  7.0               hf8c457e_1001    conda-forge
requests                  2.20.0                py36_1000    conda-forge
samtools                  1.9                 h10a08f8_12    bioconda
scikit-learn              0.19.2           py36h22eb022_0  
scipy                     1.3.1            py36h921218d_2    conda-forge
seaborn                   0.9.0                      py_1    conda-forge
setuptools                41.2.0                   py36_0    conda-forge
sip                       4.18.1          py36hf484d3e_1000    conda-forge
six                       1.11.0                py36_1001    conda-forge
snakemake-minimal         5.2.4                    py36_0    bioconda
sqlite                    3.28.0               h8b20d00_0    conda-forge
statsmodels               0.9.0           py36h3010b51_1000    conda-forge
tabulate                  0.8.3                      py_0    conda-forge
tbb                       2019.8               hc9558a2_0    conda-forge
tk                        8.6.9             hed695b0_1002    conda-forge
tornado                   6.0.3            py36h516909a_0    conda-forge
trimal                    1.4.1                h6bb024c_3    bioconda
urllib3                   1.23                  py36_1001    conda-forge
wheel                     0.33.6                   py36_0    conda-forge
wrapt                     1.11.2           py36h516909a_0    conda-forge
xorg-kbproto              1.0.7             h14c3975_1002    conda-forge
xorg-libice               1.0.10               h516909a_0    conda-forge
xorg-libsm                1.2.3             h84519dc_1000    conda-forge
xorg-libx11               1.6.8                h516909a_0    conda-forge
xorg-libxau               1.0.9                h14c3975_0    conda-forge
xorg-libxdmcp             1.1.3                h516909a_0    conda-forge
xorg-libxext              1.3.4                h516909a_0    conda-forge
xorg-libxrender           0.9.10            h516909a_1002    conda-forge
xorg-libxt                1.2.0                h516909a_0    conda-forge
xorg-renderproto          0.11.1            h14c3975_1002    conda-forge
xorg-xextproto            7.3.0             h14c3975_1002    conda-forge
xorg-xproto               7.0.31            h14c3975_1007    conda-forge
xz                        5.2.4             h14c3975_1001    conda-forge
yaml                      0.1.7             h14c3975_1001    conda-forge
zlib                      1.2.11            h516909a_1005    conda-forge
zstd                      1.4.0                h3b9ef0a_0    conda-forge
meren commented 4 years ago

We addressed this in master and will be available in v6 soon. Thank you very much @andrewsonin.