Closed meren closed 6 years ago
Sounds good with me! I'll start working on it
Self note: pangenomes without homogeneity indices break anvi-display-pan
and we need to address this before the minor release.
The documentation is updated, and the reminder above is fixed with #991 and following commits.
Thanks to @mahmoudyousef98's excellent addition to the codebase (see #977), the anvi'o pangenomic workflow has two new indices to investigate within gene cluster homogeneity.
There are at least two necessary steps going forward to make this powerful addition available for filtering gene clusters, and including some information about it in our documentation.
Documentation
We need to update our tutorial here:
http://merenlab.org/2016/11/08/pangenomics-v2/
Perhaps we can add a new section dedicated to homogeneity indices right after this one:
http://merenlab.org/2016/11/08/pangenomics-v2/#inspecting-gene-clusters
It will help others to better understand what those new layers are.
Implementing new filters
We currently allow users to filter their gene clusters based on various metrics. This is done through a function that is used both by the program
anvi-get-sequences-for-gene-clusters
(see the "ADVANCED FILTERS" section in the help menu), and through the user interface:If a pangenome includes functional or geometric homogeneity indices in additional misc data tables, then we could also make available these filters both through the interface and the command line program. This way a query like this could be possible: "Give me all gene clusters that occurs in all genomes with a geometric homogeneity >= 1.0, and functional homogeneity of <1.0". This way a user can get all non-identical core gene clusters at once.
What do you think @mahmoudyousef98?