josuebarrera / GenEra

genEra is a fast and easy-to-use command-line tool that estimates the age of the last common ancestor of protein-coding gene families.
GNU General Public License v3.0
45 stars 6 forks source link

Comparison of protein families from multiple species #5

Closed JSSaini closed 1 year ago

JSSaini commented 1 year ago

Thank you for the intriguing tool. I have predicted the functional annotations (amino acid sequences) of different microbial eukaryotes from the genomics data, and I would like to compare their gene families. Do you think it is something that is within the scope of the tool? For example comparing the protein sequences from multiple species.

josuebarrera commented 1 year ago

Dear Jaspreet, Thank you for your interest in using GenEra! The purpose of our tool is to trace back distant homologs for each protein in a genome to calculate the relative ages of genes and gene families and to detect genes that are restricted to specific evolutionary lineages (taxonomically-restricted genes). In your case, you could detect whether a gene family emerged in the last common ancestor that is shared between all your microbial eukaryotes, or if a gene is only found in one of your genomes and nowhere else in the tree of life. If you have a phylogeny that harbors your species of interest, as well as other distantly related taxa, you can also extract the evolutionary distances from the tree branches to test whether the age of your genes can be explained by homology detection failure (see Weisman et al. 2020 for an in-depth view on this topic). You can check out Figure 4 and Supplemental Figure 2 of this preprint to have a better grasp of the analysis you would be able to perform with GenEra. If you are more interested in the expansion and contraction dynamics of gene families across your genomes, you can also use CAFE. Just bear in mind that CAFE gives limited information about taxonomically-restricted genes, so both analyses can be regarded as complementary to each other. Good luck with all your research! Best, Josué.

JSSaini commented 1 year ago

Hi Josue, Thank you for getting back to me and for taking the time to answer my query. I am interested in genes or gene families which are only found in one of my given genomes. I also have their functional annotations, including the protein-coding gene counts for each species. Given that all of my microbial eukaryotic genomes belong to the same genus, I may not see a dramatic change. If I have a relative age of genes on top would be a plus but not priority. :)