Closed ShaiberAlon closed 4 years ago
anvi-self-test -v
Anvi'o version ...............................: esther (v6.1-master)
Profile DB version ...........................: 31
Contigs DB version ...........................: 14
Pan DB version ...............................: 13
Genome data storage version ..................: 6
Auxiliary data storage version ...............: 2
Structure DB version .........................: 1
Installed using the "Following the active codebase" instructions on a Linux server (I guess that means I'm a wizard!).
Hello, I am getting the same error described in #1320, BUT I did already import the seed_eggNOG_ortholog column. I have run eggnog-mapper 2 in a separate environment successfully and imported the annotation file into Anvio using:
anvi-script-run-eggnog-mapper
with the --annotation
flag.
Unlike @anzhangli84 in #1320, my seed_eggNOG_ortholog column is already populated. I've attached two samples of what my annotation files look when they are imported into Anvio using
anvi-script-run-eggnog-mapper
with the --annotation
flag. Note that I did change the 'query name' to 'g00001,g00002, etc.' in the .emapper.annotations output files to address the following error:
Config Error: Gene caller ids found in this annotation file does not start with the expected
prefix. This is a historical glitch that is not quite easy to address
programmatically, so anvi'o asks you to add the expected prefix as the first
character of every gene call in your annotations file. This is the prefix what
you need to add manually to the very beginning of every line (anvi'o developers
are very sorry for this step): 'g'.
All of the original files had a unique 8 letter prefix for each gene number that I changed to 'g' (giving this detail because I'm not sure if it's relevant to this problem)
When I run the --list-annotation-sources
on my pangenome to be analyzed, I get the following output...and am presented with KEGG, COG, eggnog, etc. choices, so it seems that the functions have been imported properly:
anvi-get-enriched-functions-per-pan-group -p gracilibacteria-pan/Gracilibacteria_Pan-PAN.db -g gracilibacteria-GENOMES.db --list-annotation-sources -o sources_list2
Genomes storage .............................................: Initialized (storage hash: hashc2b6d240)
Num genomes in storage ......................................: 32
Num genomes will be used ....................................: 26
Pan DB ......................................................: Initialized: gracilibacteria-pan/Gracilibacteria_Pan-PAN.db (v. 13)
Gene cluster homogeneity estimates ..........................: Functional: [YES]; Geometric: [YES]; Combined: [YES]
* Gene clusters are initialized for all 11721 gene clusters in the database.
Available functional annotation sources .....................: KEGG_PATHWAYS, EC_NUMBER, BiGG_Reactions, COG_CATEGORY, eggNOG_free_text, EGGNOG_BACT, GO_TERMS, BRITE, eggNOG_best_tax, KEGG_MODULE, Preferred_Name, KEGG_KO
But I get the following message when actually run the full anvi-get-enriched-functions-per-pan-group
script:
anvi-get-enriched-functions-per-pan-group -p gracilibacteria-pan/Gracilibacteria_Pan-PAN.db -g gracilibacteria-GENOMES.db --category source --annotation-source KEGG_MODULE -o GRACIL-PAN-enriched-functions-source.txt --functional-occurrence-table-output GRACIL-functions-occurrence-frequency.txt
Genomes storage .............................................: Initialized (storage hash: hashc2b6d240)
Num genomes in storage ......................................: 32
Num genomes will be used ....................................: 26
Pan DB ......................................................: Initialized: gracilibacteria-pan/Gracilibacteria_Pan-PAN.db (v. 13)
Gene cluster homogeneity estimates ..........................: Functional: [YES]; Geometric: [YES]; Combined: [YES]
* Gene clusters are initialized for all 11721 gene clusters in the database.
Category ....................................................: source
Functional annotation source ................................: KEGG_MODULE
Exclude ungrouped ...........................................: False
Occurrence frequency of functions: ..........................: GRACIL-functions-occurrence-frequency.txt
Functional occurrence summary ...............................: /usr/local/scratch/MISC/jobaker/TMP/tmp9s_50_3n
Config Error: It looks like something went wrong during the functional enrichment analysis. We
don't know what happened, but this log file could contain some clues:
/usr/local/scratch/MISC/jobaker/TMP/tmp50ez11hp
The contents of the log file:
cat /usr/local/scratch/MISC/jobaker/TMP/tmp50ez11hp
# DATE: 06 Feb 20 15:33:34
# CMD LINE: anvi-script-run-functional-enrichment-stats --input /usr/local/scratch/MISC/jobaker/TMP/tmp9s_50_3n --output GRACIL-PAN-enriched-functions-source.txt
Parsed with column specification:
cols(
KEGG_MODULE = col_character(),
function_accession = col_logical(),
gene_clusters_ids = col_character(),
associated_groups = col_character(),
p_oral = col_double(),
p_environmental = col_double(),
p_unknown = col_double(),
N_oral = col_double(),
N_environmental = col_double(),
N_unknown = col_double()
)
Error in smooth.spline(lambda, pi0, df = smooth.df) :
missing or infinite values in inputs are not allowed
Calls: %>% ... mutate_impl -> <Anonymous> -> pi0est -> smooth.spline
Execution halted
Looking at my Functional Occurrence Summary tmp file (tmp9s_50_3n.txt, attached), it looks like I still do not have values in my 'functional_accession' column either. Since I did have the seed_eggNOG_ortholog column in my annotation file, I am wondering why this is not linking up? And what I need to do so that I get 'functional_accession' values when I import annotations from eggnog-mapper?
Thanks very much for your help!
GCA_008015855.1.emapper.annotations.fixed.txt Tm7x.emapper.annotations.fixed.txt
I assume this is resolved now :) Please correct me if I'm wrong.
Thanks!
Re: #1320