merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
413 stars 142 forks source link

Display genomic context for genes in a protein cluster #726

Open brymerr921 opened 6 years ago

brymerr921 commented 6 years ago

Hi Anvi'o developers,

I've been using the pangenomic workflow a lot, but (1) knowing the genomic context of the genes that are a part of each protein cluster and (2) having quick access to the functions of genes in a protein cluster would be super useful. For instance, proteins that cluster together might look similar at the amino acid level but be involved in different bacterial pathways if the surrounding genes are different. This visualization will support analyses of bacterial pathways instead of just genes by themselves.

Here's a drawing that shows what I would find extremely helpful. I think it'd make sense as an additional part of the page that appears when I right click on a protein cluster in anvi-display-pan and choose "Inspect", or perhaps as a separate menu option after right-clicking.

The main ideas are:

As always, thanks for making Anvi'o great!

Bryan

anvi o pc inspect page enhancement

ozcan commented 6 years ago

Hi Bryan,

These are awesome ideas and we would like to implement, and the sketch explains it very well. We already improved many things about pangenomics workflow for v4 which we will release soon, including some changes in inspect page (color outputs, popups for detailed information about gene caller), but I think adding genomic context will make it much better. I will work on this after releasing this version, hopefully, we can have these features ready for v5. I will comment here once we have a prototype on master.

All the best, Ozcan

brymerr921 commented 6 years ago

Ozcan,

Thanks, that is wonderful news! I'll be first in line to test it and provide feedback when it rolls out.

Some other thoughts bouncing around my head about this are:

  1. Trying to be unambiguous about what these "units of several genes" actually are. I've often heard a set of genes located near each other on a chromosome be called "gene clusters" before, but clearly a different term is needed in the context of Anvi'o.

  2. A specific way to annotate genes that belong in a defined functional cluster. At present, I can hack this by feeding a file to anvi-import-functions where genes belonging to the same functional cluster are annotated with the same, unique identifier (e.g. gene_cluster_1)

  3. Synteny-aware gene/protein clustering. Based on some user-defined parameters, it may be interesting to determine (while running anvi-pan-genome or after as some sort of a filter) whether any genes in the Anvi'o gene (protein) cluster should be kicked out because the genomic context is different. For example, for protein A and B which are members of the same protein cluster (high % identity, etc.) they must pass the minbit, etc. thresholds, but also need to have n genes in the same protein cluster within m genes of protein A and Protein B.

Best, Bryan

brymerr921 commented 6 years ago

Hi, @ozcan, I was wondering if there are any updates on this front. Thanks!

meren commented 6 years ago

Hey Bryan,

We are finalizing v5, and clearly this feature will be for another spring :/ You will see from the release notes it was a very busy period for anvi'o developers and many outstanding features are waiting to be implemented :/

Sorry.