I tried to test pangenomic workflow with anvi-self-test --suite pangenomics and got an error. Based on log-file, it was appeared due to the use of deprecated static method pd.DataFrame.from_csv.
Possible solutions:
Freeze an old pandas version in your anvio conda-environment. Just run:
conda install pandas=0.23.1. It works with the current anvio version (5.5.0).
If you have UnsatisfiableError: The following specifications were found to be incompatible with each other:, just specify the maximum among the minimum pandas versions from the error output.
Or if you want to install Anvio properly at once and without pain, just run:
conda create -n anvio5 -c bioconda -c conda-forge anvio=5.5.0 pandas=0.23.1. Please, add this description here.
anvi-self-test --suite mini and anvi-self-test --suite pangenomics work fine in this case.
Replace all the uses of pd.DataFrame.from_csv with pd.read_csv with the same arguments. It supposedly has the same signature, so everything should be fine.
My OS is Linux Mint 19.2.
I installed anvio as follows:
Anvi'o version ...............................: margaret (v5.5)
Profile DB version ...........................: 31
Contigs DB version ...........................: 12
Pan DB version ...............................: 13
Genome data storage version ..................: 6
Auxiliary data storage version ...............: 2
Structure DB version .........................: 1
The output of cat /tmp/tmp02a1zwsp/test-output/pan_test/ANI_LOG.txt:
# DATE: 06 Sep 19 15:19:34
# CMD LINE: average_nucleotide_identity.py --outdir output --indir /tmp/tmpz_j7gsli --method ANIb --workers 1
Traceback (most recent call last):
File "/home/sonin/.anaconda3/envs/anvio5/bin/average_nucleotide_identity.py", line 804, in <module>
results = methods[args.method][0](infiles, org_lengths)
File "/home/sonin/.anaconda3/envs/anvio5/bin/average_nucleotide_identity.py", line 574, in unified_anib
fraglengths=fraglengths, mode=args.method)
File "/home/sonin/.anaconda3/envs/anvio5/lib/python3.6/site-packages/pyani/anib.py", line 382, in process_blast
resultvals = parse_blast_tab(blastfile, fraglengths, mode)
File "/home/sonin/.anaconda3/envs/anvio5/lib/python3.6/site-packages/pyani/anib.py", line 431, in parse_blast_tab
data = pd.DataFrame.from_csv(filename, header=None, sep='\t')
AttributeError: type object 'DataFrame' has no attribute 'from_csv'
The output of anvi-self-test --suite pangenomics:
:: Output directory ...
/tmp/tmp02a1zwsp/test-output
:: Anvo'o version ...
Anvi'o version ...............................: margaret (v5.5)
Profile DB version ...........................: 31
Contigs DB version ...........................: 12
Pan DB version ...............................: 13
Genome data storage version ..................: 6
Auxiliary data storage version ...............: 2
Structure DB version .........................: 1
:: Setting up the pan analysis directory ...
:: Generating contigs databases for external genomes ...
:: INPUT DIR: /tmp/tmp02a1zwsp/test-output/pan_test, FNAME: 01 ...
:: RENAMING CONTIGS ...
Input ........................................: /tmp/tmp02a1zwsp/test-output/pan_test/01.fa
Output .......................................: /tmp/tmp02a1zwsp/test-output/pan_test/01-clean.fa
Minimum length ...............................: 0
Total num contigs ............................: 1
Total num nucleotides ........................: 139,930
Contigs removed ..............................: 0 (0.00% of all)
Nucleotides removed ..........................: 0 (0.00% of all)
Deflines simplified ..........................: True
:: GENERATING THE CONTIGS DB ...
Input FASTA file .............................: /tmp/tmp02a1zwsp/test-output/pan_test/01-clean.fa
Name .........................................: 01
Description ..................................: No description is given
Split Length .................................: 20,000
K-mer size ...................................: 4
Skip gene calling? ...........................: False
External gene calls provided? ................: None
Ignoring internal stop codons? ...............: False
Splitting pays attention to gene calls? ......: False
Finding ORFs in contigs
===============================================
Genes ........................................: /tmp/tmpaognu7p1/contigs.genes
Amino acid sequences .........................: /tmp/tmpaognu7p1/contigs.amino_acid_sequences
Log file .....................................: /tmp/tmpaognu7p1/00_log.txt
Result .......................................: Prodigal (v2.6.3) has identified 119 genes.
Contigs with at least one gene call ..........: 1 of 1 (100.0%)
Contigs database .............................: A new database, /tmp/tmp02a1zwsp/test-output/pan_test/01.db, has been created.
Number of contigs ............................: 1
Number of splits .............................: 7
Total number of nucleotides ..................: 139,930
Gene calling step skipped ....................: False
Splits broke genes (non-mindful mode) ........: True
Desired split length (what the user wanted) ..: 20,000
Average split length (wnat anvi'o gave back) .: 19,990
:: RUNNING HMMs ...
Target found .................................: AA:GENE
Contigs DB ...................................: Initialized: /tmp/tmp02a1zwsp/test-output/pan_test/01.db (v. 12)
WARNING
===============================================
You did not provide any gene caller ids. As a result, anvi'o will give you back
sequences for every 119 gene call stored in the contigs database.
Output .......................................: /tmp/tmpu_0i4g8i/AA_gene_sequences.fa
Target found .................................: RNA:CONTIG
Contigs DB ...................................: Initialized: /tmp/tmp02a1zwsp/test-output/pan_test/01.db (v. 12)
HMM Profiling for Ribosomal_RNAs
===============================================
Reference ....................................: Seemann T, https://github.com/tseemann/barrnap
Kind .........................................: Ribosomal_RNAs
Alphabet .....................................: RNA
Context ......................................: CONTIG
Domain .......................................: N\A
HMM model path ...............................: /home/sonin/.anaconda3/envs/anvio5/lib/python3.6/site-packages/anvio/data/hmm/Ribosomal_RNAs/genes.hmm.gz
Number of genes ..............................: 12
Noise cutoff term(s) .........................: --cut_ga
Number of CPUs will be used for search .......: 1
Temporary work dir ...........................: /tmp/tmpds6fx32c
HMM scan output ..............................: /tmp/tmpds6fx32c/hmm.output
HMM scan hits ................................: /tmp/tmpds6fx32c/hmm.hits
Log file .....................................: /tmp/tmpds6fx32c/00_log.txt
Number of raw hits ...........................: 0
* The HMM source 'Ribosomal_RNAs' returned 0 hits. SAD (but it's stil OK).
HMM Profiling for Rinke_et_al
===============================================
Reference ....................................: Rinke et al, http://www.nature.com/nature/journal/v499/n7459/full/nature12352.html
Kind .........................................: singlecopy
Alphabet .....................................: AA
Context ......................................: GENE
Domain .......................................: archaea
HMM model path ...............................: /home/sonin/.anaconda3/envs/anvio5/lib/python3.6/site-packages/anvio/data/hmm/Rinke_et_al/genes.hmm.gz
Number of genes ..............................: 162
Noise cutoff term(s) .........................: --cut_ga
Number of CPUs will be used for search .......: 1
Temporary work dir ...........................: /tmp/tmpqgakp09a
HMM scan output ..............................: /tmp/tmpqgakp09a/hmm.output
HMM scan hits ................................: /tmp/tmpqgakp09a/hmm.hits
Log file .....................................: /tmp/tmpqgakp09a/00_log.txt
Number of raw hits ...........................: 0
* The HMM source 'Rinke_et_al' returned 0 hits. SAD (but it's stil OK).
HMM Profiling for Campbell_et_al
===============================================
Reference ....................................: Campbell et al, http://www.pnas.org/content/110/14/5540.short
Kind .........................................: singlecopy
Alphabet .....................................: AA
Context ......................................: GENE
Domain .......................................: bacteria
HMM model path ...............................: /home/sonin/.anaconda3/envs/anvio5/lib/python3.6/site-packages/anvio/data/hmm/Campbell_et_al/genes.hmm.gz
Number of genes ..............................: 139
Noise cutoff term(s) .........................: --cut_ga
Number of CPUs will be used for search .......: 1
Temporary work dir ...........................: /tmp/tmpu2hneyd0
HMM scan output ..............................: /tmp/tmpu2hneyd0/hmm.output
HMM scan hits ................................: /tmp/tmpu2hneyd0/hmm.hits
Log file .....................................: /tmp/tmpu2hneyd0/00_log.txt
Number of raw hits ...........................: 6
HMM Profiling for BUSCO_83_Protista
===============================================
Reference ....................................: See Simao et al, doi:10.1093/bioinformatics/btv351, as well as http://merenlab.org/delmont-euk-scgs
Kind .........................................: singlecopy
Alphabet .....................................: AA
Context ......................................: GENE
Domain .......................................: eukarya
HMM model path ...............................: /home/sonin/.anaconda3/envs/anvio5/lib/python3.6/site-packages/anvio/data/hmm/BUSCO_83_Protista/genes.hmm.gz
Number of genes ..............................: 83
Noise cutoff term(s) .........................: -E 1e-25
Number of CPUs will be used for search .......: 1
Temporary work dir ...........................: /tmp/tmpl8g03ajx
HMM scan output ..............................: /tmp/tmpl8g03ajx/hmm.output
HMM scan hits ................................: /tmp/tmpl8g03ajx/hmm.hits
Log file .....................................: /tmp/tmpl8g03ajx/00_log.txt
Number of raw hits ...........................: 0
* The HMM source 'BUSCO_83_Protista' returned 0 hits. SAD (but it's stil OK).
:: INPUT DIR: /tmp/tmp02a1zwsp/test-output/pan_test, FNAME: 02 ...
:: RENAMING CONTIGS ...
Input ........................................: /tmp/tmp02a1zwsp/test-output/pan_test/02.fa
Output .......................................: /tmp/tmp02a1zwsp/test-output/pan_test/02-clean.fa
Minimum length ...............................: 0
Total num contigs ............................: 1
Total num nucleotides ........................: 139,930
Contigs removed ..............................: 0 (0.00% of all)
Nucleotides removed ..........................: 0 (0.00% of all)
Deflines simplified ..........................: True
:: GENERATING THE CONTIGS DB ...
Input FASTA file .............................: /tmp/tmp02a1zwsp/test-output/pan_test/02-clean.fa
Name .........................................: 02
Description ..................................: No description is given
Split Length .................................: 20,000
K-mer size ...................................: 4
Skip gene calling? ...........................: False
External gene calls provided? ................: None
Ignoring internal stop codons? ...............: False
Splitting pays attention to gene calls? ......: False
Finding ORFs in contigs
===============================================
Genes ........................................: /tmp/tmp5nqs9hin/contigs.genes
Amino acid sequences .........................: /tmp/tmp5nqs9hin/contigs.amino_acid_sequences
Log file .....................................: /tmp/tmp5nqs9hin/00_log.txt
Result .......................................: Prodigal (v2.6.3) has identified 119 genes.
Contigs with at least one gene call ..........: 1 of 1 (100.0%)
Contigs database .............................: A new database, /tmp/tmp02a1zwsp/test-output/pan_test/02.db, has been created.
Number of contigs ............................: 1
Number of splits .............................: 7
Total number of nucleotides ..................: 139,930
Gene calling step skipped ....................: False
Splits broke genes (non-mindful mode) ........: True
Desired split length (what the user wanted) ..: 20,000
Average split length (wnat anvi'o gave back) .: 19,990
:: RUNNING HMMs ...
Target found .................................: RNA:CONTIG
Contigs DB ...................................: Initialized: /tmp/tmp02a1zwsp/test-output/pan_test/02.db (v. 12)
Target found .................................: AA:GENE
Contigs DB ...................................: Initialized: /tmp/tmp02a1zwsp/test-output/pan_test/02.db (v. 12)
WARNING
===============================================
You did not provide any gene caller ids. As a result, anvi'o will give you back
sequences for every 119 gene call stored in the contigs database.
Output .......................................: /tmp/tmp162qmeo8/AA_gene_sequences.fa
HMM Profiling for Ribosomal_RNAs
===============================================
Reference ....................................: Seemann T, https://github.com/tseemann/barrnap
Kind .........................................: Ribosomal_RNAs
Alphabet .....................................: RNA
Context ......................................: CONTIG
Domain .......................................: N\A
HMM model path ...............................: /home/sonin/.anaconda3/envs/anvio5/lib/python3.6/site-packages/anvio/data/hmm/Ribosomal_RNAs/genes.hmm.gz
Number of genes ..............................: 12
Noise cutoff term(s) .........................: --cut_ga
Number of CPUs will be used for search .......: 1
Temporary work dir ...........................: /tmp/tmpg9e5skmk
HMM scan output ..............................: /tmp/tmpg9e5skmk/hmm.output
HMM scan hits ................................: /tmp/tmpg9e5skmk/hmm.hits
Log file .....................................: /tmp/tmpg9e5skmk/00_log.txt
Number of raw hits ...........................: 0
* The HMM source 'Ribosomal_RNAs' returned 0 hits. SAD (but it's stil OK).
HMM Profiling for Rinke_et_al
===============================================
Reference ....................................: Rinke et al, http://www.nature.com/nature/journal/v499/n7459/full/nature12352.html
Kind .........................................: singlecopy
Alphabet .....................................: AA
Context ......................................: GENE
Domain .......................................: archaea
HMM model path ...............................: /home/sonin/.anaconda3/envs/anvio5/lib/python3.6/site-packages/anvio/data/hmm/Rinke_et_al/genes.hmm.gz
Number of genes ..............................: 162
Noise cutoff term(s) .........................: --cut_ga
Number of CPUs will be used for search .......: 1
Temporary work dir ...........................: /tmp/tmpwno15se5
HMM scan output ..............................: /tmp/tmpwno15se5/hmm.output
HMM scan hits ................................: /tmp/tmpwno15se5/hmm.hits
Log file .....................................: /tmp/tmpwno15se5/00_log.txt
Number of raw hits ...........................: 0
* The HMM source 'Rinke_et_al' returned 0 hits. SAD (but it's stil OK).
HMM Profiling for Campbell_et_al
===============================================
Reference ....................................: Campbell et al, http://www.pnas.org/content/110/14/5540.short
Kind .........................................: singlecopy
Alphabet .....................................: AA
Context ......................................: GENE
Domain .......................................: bacteria
HMM model path ...............................: /home/sonin/.anaconda3/envs/anvio5/lib/python3.6/site-packages/anvio/data/hmm/Campbell_et_al/genes.hmm.gz
Number of genes ..............................: 139
Noise cutoff term(s) .........................: --cut_ga
Number of CPUs will be used for search .......: 1
Temporary work dir ...........................: /tmp/tmp3sv8zz_h
HMM scan output ..............................: /tmp/tmp3sv8zz_h/hmm.output
HMM scan hits ................................: /tmp/tmp3sv8zz_h/hmm.hits
Log file .....................................: /tmp/tmp3sv8zz_h/00_log.txt
Number of raw hits ...........................: 6
HMM Profiling for BUSCO_83_Protista
===============================================
Reference ....................................: See Simao et al, doi:10.1093/bioinformatics/btv351, as well as http://merenlab.org/delmont-euk-scgs
Kind .........................................: singlecopy
Alphabet .....................................: AA
Context ......................................: GENE
Domain .......................................: eukarya
HMM model path ...............................: /home/sonin/.anaconda3/envs/anvio5/lib/python3.6/site-packages/anvio/data/hmm/BUSCO_83_Protista/genes.hmm.gz
Number of genes ..............................: 83
Noise cutoff term(s) .........................: -E 1e-25
Number of CPUs will be used for search .......: 1
Temporary work dir ...........................: /tmp/tmpgp4qf7ne
HMM scan output ..............................: /tmp/tmpgp4qf7ne/hmm.output
HMM scan hits ................................: /tmp/tmpgp4qf7ne/hmm.hits
Log file .....................................: /tmp/tmpgp4qf7ne/00_log.txt
Number of raw hits ...........................: 0
* The HMM source 'BUSCO_83_Protista' returned 0 hits. SAD (but it's stil OK).
:: INPUT DIR: /tmp/tmp02a1zwsp/test-output/pan_test, FNAME: 03 ...
:: RENAMING CONTIGS ...
Input ........................................: /tmp/tmp02a1zwsp/test-output/pan_test/03.fa
Output .......................................: /tmp/tmp02a1zwsp/test-output/pan_test/03-clean.fa
Minimum length ...............................: 0
Total num contigs ............................: 1
Total num nucleotides ........................: 139,930
Contigs removed ..............................: 0 (0.00% of all)
Nucleotides removed ..........................: 0 (0.00% of all)
Deflines simplified ..........................: True
:: GENERATING THE CONTIGS DB ...
Input FASTA file .............................: /tmp/tmp02a1zwsp/test-output/pan_test/03-clean.fa
Name .........................................: 03
Description ..................................: No description is given
Split Length .................................: 20,000
K-mer size ...................................: 4
Skip gene calling? ...........................: False
External gene calls provided? ................: None
Ignoring internal stop codons? ...............: False
Splitting pays attention to gene calls? ......: False
Finding ORFs in contigs
===============================================
Genes ........................................: /tmp/tmphsyesylc/contigs.genes
Amino acid sequences .........................: /tmp/tmphsyesylc/contigs.amino_acid_sequences
Log file .....................................: /tmp/tmphsyesylc/00_log.txt
Result .......................................: Prodigal (v2.6.3) has identified 117 genes.
Contigs with at least one gene call ..........: 1 of 1 (100.0%)
Contigs database .............................: A new database, /tmp/tmp02a1zwsp/test-output/pan_test/03.db, has been created.
Number of contigs ............................: 1
Number of splits .............................: 7
Total number of nucleotides ..................: 139,930
Gene calling step skipped ....................: False
Splits broke genes (non-mindful mode) ........: True
Desired split length (what the user wanted) ..: 20,000
Average split length (wnat anvi'o gave back) .: 19,990
:: RUNNING HMMs ...
Target found .................................: AA:GENE
Contigs DB ...................................: Initialized: /tmp/tmp02a1zwsp/test-output/pan_test/03.db (v. 12)
WARNING
===============================================
You did not provide any gene caller ids. As a result, anvi'o will give you back
sequences for every 117 gene call stored in the contigs database.
Output .......................................: /tmp/tmp3o7lgl4l/AA_gene_sequences.fa
Target found .................................: RNA:CONTIG
Contigs DB ...................................: Initialized: /tmp/tmp02a1zwsp/test-output/pan_test/03.db (v. 12)
HMM Profiling for Ribosomal_RNAs
===============================================
Reference ....................................: Seemann T, https://github.com/tseemann/barrnap
Kind .........................................: Ribosomal_RNAs
Alphabet .....................................: RNA
Context ......................................: CONTIG
Domain .......................................: N\A
HMM model path ...............................: /home/sonin/.anaconda3/envs/anvio5/lib/python3.6/site-packages/anvio/data/hmm/Ribosomal_RNAs/genes.hmm.gz
Number of genes ..............................: 12
Noise cutoff term(s) .........................: --cut_ga
Number of CPUs will be used for search .......: 1
Temporary work dir ...........................: /tmp/tmpgv_givm1
HMM scan output ..............................: /tmp/tmpgv_givm1/hmm.output
HMM scan hits ................................: /tmp/tmpgv_givm1/hmm.hits
Log file .....................................: /tmp/tmpgv_givm1/00_log.txt
Number of raw hits ...........................: 0
* The HMM source 'Ribosomal_RNAs' returned 0 hits. SAD (but it's stil OK).
HMM Profiling for Rinke_et_al
===============================================
Reference ....................................: Rinke et al, http://www.nature.com/nature/journal/v499/n7459/full/nature12352.html
Kind .........................................: singlecopy
Alphabet .....................................: AA
Context ......................................: GENE
Domain .......................................: archaea
HMM model path ...............................: /home/sonin/.anaconda3/envs/anvio5/lib/python3.6/site-packages/anvio/data/hmm/Rinke_et_al/genes.hmm.gz
Number of genes ..............................: 162
Noise cutoff term(s) .........................: --cut_ga
Number of CPUs will be used for search .......: 1
Temporary work dir ...........................: /tmp/tmp7tcserlk
HMM scan output ..............................: /tmp/tmp7tcserlk/hmm.output
HMM scan hits ................................: /tmp/tmp7tcserlk/hmm.hits
Log file .....................................: /tmp/tmp7tcserlk/00_log.txt
Number of raw hits ...........................: 0
* The HMM source 'Rinke_et_al' returned 0 hits. SAD (but it's stil OK).
HMM Profiling for Campbell_et_al
===============================================
Reference ....................................: Campbell et al, http://www.pnas.org/content/110/14/5540.short
Kind .........................................: singlecopy
Alphabet .....................................: AA
Context ......................................: GENE
Domain .......................................: bacteria
HMM model path ...............................: /home/sonin/.anaconda3/envs/anvio5/lib/python3.6/site-packages/anvio/data/hmm/Campbell_et_al/genes.hmm.gz
Number of genes ..............................: 139
Noise cutoff term(s) .........................: --cut_ga
Number of CPUs will be used for search .......: 1
Temporary work dir ...........................: /tmp/tmp0du2p3wj
HMM scan output ..............................: /tmp/tmp0du2p3wj/hmm.output
HMM scan hits ................................: /tmp/tmp0du2p3wj/hmm.hits
Log file .....................................: /tmp/tmp0du2p3wj/00_log.txt
Number of raw hits ...........................: 6
HMM Profiling for BUSCO_83_Protista
===============================================
Reference ....................................: See Simao et al, doi:10.1093/bioinformatics/btv351, as well as http://merenlab.org/delmont-euk-scgs
Kind .........................................: singlecopy
Alphabet .....................................: AA
Context ......................................: GENE
Domain .......................................: eukarya
HMM model path ...............................: /home/sonin/.anaconda3/envs/anvio5/lib/python3.6/site-packages/anvio/data/hmm/BUSCO_83_Protista/genes.hmm.gz
Number of genes ..............................: 83
Noise cutoff term(s) .........................: -E 1e-25
Number of CPUs will be used for search .......: 1
Temporary work dir ...........................: /tmp/tmpo_itpb42
HMM scan output ..............................: /tmp/tmpo_itpb42/hmm.output
HMM scan hits ................................: /tmp/tmpo_itpb42/hmm.hits
Log file .....................................: /tmp/tmpo_itpb42/00_log.txt
Number of raw hits ...........................: 0
* The HMM source 'BUSCO_83_Protista' returned 0 hits. SAD (but it's stil OK).
:: Importing functions into the contigs database ...
Gene functions ...............................: 494 function calls from 5 sources for 115 unique gene calls has been added to the contigs database.
Gene functions ...............................: 494 function calls from 5 sources for 115 unique gene calls has been added to the contigs database.
Gene functions ...............................: 484 function calls from 5 sources for 113 unique gene calls has been added to the contigs database.
:: Generating an anvi'o genomes storage ...
WARNING
===============================================
Good news! Anvi'o found all these functions that are common to all of your
genomes and will use them for downstream analyses and is very proud of you:
'GO_TERMS, EGGNOG_BACT, COG_FUNCTION, KEGG_PATHWAYS, COG_CATEGORY'.
Internal genomes .............................: 0 have been initialized.
External genomes .............................: 3 found.
* g01 is stored with 119 genes (0 of which were partial)
* g02 is stored with 119 genes (0 of which were partial)
* g03 is stored with 117 genes (1 of which were partial)
The new genomes storage ......................: TEST-GENOMES.db (v6, signature: hashac7aa5ee)
Number of genomes ............................: 3 (internal: 0, external: 3)
Number of gene calls .........................: 355
Number of partial gene calls .................: 1
:: Running the pangenome anaysis with default parameters ...
WARNING
===============================================
If you publish results from this workflow, please do not forget to cite DIAMOND
(doi:10.1038/nmeth.3176), unless you use it with --use-ncbi-blast flag, and MCL
(http://micans.org/mcl/ and doi:10.1007/978-1-61779-361-5_15)
Genomes storage ..............................: Initialized (storage hash: hashac7aa5ee)
Num genomes in storage .......................: 3
Num genomes will be used .....................: 3
Pan database .................................: A new database, /tmp/tmp02a1zwsp/test-output/pan_test/TEST/TEST-PAN.db, has been created.
Exclude partial gene calls ...................: False
AA sequences FASTA ...........................: /tmp/tmp02a1zwsp/test-output/pan_test/TEST/combined-aas.fa
Num AA sequences reported ....................: 355
Num excluded gene calls ......................: 0
Unique AA sequences FASTA ....................: /tmp/tmp02a1zwsp/test-output/pan_test/TEST/combined-aas.fa.unique
WARNING
===============================================
You elected to use NCBI's blastp for amino acid sequence search. Running blastp
will be significantly slower than DIAMOND (although, anvi'o developers are
convinced that you *are* doing the right thing, so, kudos to you).
BLAST search db ..............................: /tmp/tmp02a1zwsp/test-output/pan_test/TEST/combined-aas.fa.unique
BLAST results ................................: /tmp/tmp02a1zwsp/test-output/pan_test/TEST/blast-search-results.txt
Min percent identity .........................: 0.0
Minbit .......................................: 0.5
Filtered search results ......................: 1,053 edges stored
MCL input ....................................: /tmp/tmp02a1zwsp/test-output/pan_test/TEST/mcl-input.txt
MCL inflation ................................: 2.0
MCL output ...................................: /tmp/tmp02a1zwsp/test-output/pan_test/TEST/mcl-clusters.txt
Number of gene clusters ......................: 121
CITATION
===============================================
Anvi'o will use 'muscle' by Edgar, doi:10.1093/nar/gkh340
(http://www.drive5.com/muscle) to align your sequences. If you publish your
findings, please do not forget to properly credit their work.
New data for 'items' in data group 'default'
===============================================
Data key "num_genomes_gene_cluster_has_hits" .: Predicted type: int
Data key "num_genes_in_gene_cluster" .........: Predicted type: int
Data key "max_num_paralogs" ..................: Predicted type: int
Data key "SCG" ...............................: Predicted type: int
WARNING
===============================================
You (or the programmer) asked anvi'o to NOT check the consistency of the names
of your items between your additional data and the pan database you are
attempting to update. So be it. Anvi'o will not check anything, but if things
don't look the way you expected them to look, you will not blame anvi'o for your
poorly prepared data, but choose between yourself or Obama.
NEW DATA
===============================================
Database .....................................: pan
Data group ...................................: default
Data table ...................................: items
New data keys ................................: num_genomes_gene_cluster_has_hits, num_genes_in_gene_cluster, max_num_paralogs, SCG.
gene clusters info ...........................: 121 gene_clusters stored in the database
New items order ..............................: "presence-absence:euclidean:ward" (type newick) has been added to the database...
WARNING
===============================================
Clustering for "presence-absence:euclidean:ward" is already in the database. It
will be replaced with the new content.
New items order ..............................: "presence-absence:euclidean:ward" (type newick) has been added to the database...
New items order ..............................: "frequency:euclidean:ward" (type newick) has been added to the database...
New items order ..............................: "Forced synteny <> g01:NA:NA" (type basic) has been added to the database...
New items order ..............................: "Forced synteny <> g02:NA:NA" (type basic) has been added to the database...
New items order ..............................: "Forced synteny <> g03:NA:NA" (type basic) has been added to the database...
New layer_orders data...
===============================================
Data key "gene_cluster presence absence" .....: Type: newick
Data key "gene_cluster frequencies" ..........: Type: newick
New order data added to the db for layer_orders : gene_cluster presence absence, gene_cluster frequencies.
New data for 'layers' in data group 'default'
===============================================
Data key "total_length" ......................: Predicted type: int
Data key "gc_content" ........................: Predicted type: float
Data key "percent_completion" ................: Predicted type: int
Data key "percent_redundancy" ................: Predicted type: int
Data key "num_genes" .........................: Predicted type: int
Data key "avg_gene_length" ...................: Predicted type: float
Data key "num_genes_per_kb" ..................: Predicted type: float
Data key "singleton_gene_clusters" ...........: Predicted type: int
Data key "num_gene_clusters" .................: Predicted type: int
NEW DATA
===============================================
Database .....................................: pan
Data group ...................................: default
Data table ...................................: layers
New data keys ................................: total_length, gc_content, percent_completion, percent_redundancy, num_genes, avg_gene_length, num_genes_per_kb, singleton_gene_clusters, num_gene_clusters.
Genomes storage .............................................: Initialized (storage hash: hashac7aa5ee)
Num genomes in storage ......................................: 3
Num genomes will be used ....................................: 3
Pan DB ......................................................: Initialized: /tmp/tmp02a1zwsp/test-output/pan_test/TEST/TEST-PAN.db (v. 13)
Gene cluster homogeneity estimates ..........................: Functional: [NO]; Geometric: [NO]; Combined: [NO]
* Gene clusters are initialized for all 121 gene clusters in the database.
New data for 'items' in data group 'default'
===============================================
Data key "functional_homogeneity_index" ......: Predicted type: float
Data key "geometric_homogeneity_index" .......: Predicted type: float
Data key "combined_homogeneity_index" ........: Predicted type: float
WARNING
===============================================
You (or the programmer) asked anvi'o to NOT check the consistency of the names
of your items between your additional data and the pan database you are
attempting to update. So be it. Anvi'o will not check anything, but if things
don't look the way you expected them to look, you will not blame anvi'o for your
poorly prepared data, but choose between yourself or Obama.
NEW DATA
===============================================
Database .....................................: pan
Data group ...................................: default
Data table ...................................: items
New data keys ................................: functional_homogeneity_index, geometric_homogeneity_index, combined_homogeneity_index.
log file ....................................................: /tmp/tmp02a1zwsp/test-output/pan_test/TEST/log.txt
:: Running ANI on genomes and storing results in the PAN database ...
CITATION
===============================================
Anvi'o will use 'PyANI' by Pritchard et al. (DOI: 10.1039/C5AY02550H) to compute
ANI. If you publish your findings, please do not forget to properly credit their
work.
[PyANI] Num threads to use ...................: 1
[PyANI] Alignment method .....................: ANIb
[PyANI] Log file path ........................: /tmp/tmp02a1zwsp/test-output/pan_test/ANI_LOG.txt
Genomes found ................................: 3
Temporary FASTA output directory .............: /tmp/tmpz_j7gsli
Output directory .............................: /tmp/tmp02a1zwsp/test-output/pan_test/ANI_TEST
Config Error: PyANI returned with non-zero exit code, there may be some errors. please check
the log file for details.
Config Error: According to the exit code ('255'), anvi'o suspects that something may have gone
wrong while running your tests :/ We hope that the reason is clear to you from
the lines above. But if you don't see anything obvious, and especially if the
test ended up running until the end with reasonable looking final results, you
shouldn't worry too much about this error. Life is short and we all can worry
just a bit less.
I tried to test pangenomic workflow with
anvi-self-test --suite pangenomics
and got an error. Based on log-file, it was appeared due to the use of deprecated static methodpd.DataFrame.from_csv
.Possible solutions:
Freeze an old
pandas
version in your anvio conda-environment. Just run:conda install pandas=0.23.1
. It works with the current anvio version (5.5.0).If you have
UnsatisfiableError: The following specifications were found to be incompatible with each other:
, just specify the maximum among the minimum pandas versions from the error output.Or if you want to install Anvio properly at once and without pain, just run:
conda create -n anvio5 -c bioconda -c conda-forge anvio=5.5.0 pandas=0.23.1
. Please, add this description here.anvi-self-test --suite mini
andanvi-self-test --suite pangenomics
work fine in this case.pd.DataFrame.from_csv
withpd.read_csv
with the same arguments. It supposedly has the same signature, so everything should be fine.My OS is Linux Mint 19.2. I installed anvio as follows:
My
anvi-self-test --version
:The output of
cat /tmp/tmp02a1zwsp/test-output/pan_test/ANI_LOG.txt
:The output of
anvi-self-test --suite pangenomics
:The output of
conda list
: