Open Louis-MG opened 2 days ago
Hey @Louis-MG,
I don't think this program actually cares about the duplicate hashes. I just tested it with copy-pasta genomes without using --skip-checking-genome-hashes
, it worked fine :)
$ anvi-script-gen-function-matrix-across-genomes -e external-genomes.txt \
-G groups.txt \
--annotation-source KOfam \
--output-file-prefix functional_enrichment_all
Groups found and parsed ......................: E_faecali, E_faecium
WARNING
===============================================
Just FYI, for any gene call with multiple functional annotations from the same
source in a given genome, anvi'o only kept the annotation with the BEST e-value.
Keep this in mind when interpreting the output of this program.
Number of KOfam functions found across 2 groups : 1,545
Number of KOfam functions associated with all groups and SKIPPED : 1,113
Number of KOfam functions in final occurrence table : 432
CITATION
===============================================
This program will compute enrichment scores using an R script developed by Amy
Willis. You can find more information about it in the following paper: Shaiber,
Willis et al (https://doi.org/10.1186/s13059-020-02195-w). When you publish your
findings, please do not forget to properly credit this work. :)
AMY's ENRICHMENT ANALYSIS 🚀
===============================================
Functional occurrence stats input file path: : /var/folders/gw/5mdblzs94gsb1ss44llgl3_h0000gn/T/tmp4wy4yzrb/FUNC_OCCURENCE_STATS.txt
Functional enrichment output file path: .....: /Users/meren/Downloads/INFANT-GUT-TUTORIAL/additional-files/pangenomics/functional_enrichment_all-FUNCTIONAL-ENRICHMENT.txt
Temporary log file (use `--debug` to keep): .: /var/folders/gw/5mdblzs94gsb1ss44llgl3_h0000gn/T/tmpk3lz5hky
Functions across genomes (frequency) .........: /Users/meren/Downloads/INFANT-GUT-TUTORIAL/additional-files/pangenomics/functional_enrichment_all-FREQUENCY.txt
Functions across genomes (presence/absence) ..: /Users/meren/Downloads/INFANT-GUT-TUTORIAL/additional-files/pangenomics/functional_enrichment_all-PRESENCE-ABSENCE.txt
Only when I had literally identical genomes in two groups, anvi'o complained:
Groups found and parsed ......................: E_faecali, E_faecium
WARNING
===============================================
In an ideal world, each group would describe at least two layer names. It is not
the case for these groups: E_faecali, E_faecium. That is OK and anvi'o will
continue with this analysis, but if something goes wrong with your stats or
whatever, you will remember this moment and go like, "Hmm. That's why my
adjusted q-values are like one point zero 🤔".
WARNING
===============================================
Just FYI, for any gene call with multiple functional annotations from the same
source in a given genome, anvi'o only kept the annotation with the BEST e-value.
Keep this in mind when interpreting the output of this program.
Number of KOfam functions found across 2 groups : 1,246
Number of KOfam functions associated with all groups and SKIPPED : 1,246
Number of KOfam functions in final occurrence table : 0
Config Error: Something weird is happening here :( It seems every single function across your
genomes is associated with all groups you have defined. There is nothing much
anvi'o can work with here. If you think this is a mistake, please let us know.
I'm having hard time reproducing this :(
Can you explain EXACTLY how you ended up here? Perhaps you can send us the external genomes file you have?
Short description of the problem
Help message indicates that I should use an option that is then refused as unknown.
anvi'o version
dev, installed using the documentation. Updated just sept.19 18h54 (canadian time).
Detailed description of the issue
I want to duplicate contigsDB to compute functional enrichment accross genomes. I want to ignore the hashes of the db to force anvio. I believe you just forgot to implement it :D .