josuebarrera / GenEra

genEra is a fast and easy-to-use command-line tool that estimates the age of the last common ancestor of protein-coding gene families.
GNU General Public License v3.0
46 stars 6 forks source link
bioinformatics comparative-genomics founder-events founder-gene gene-age gene-family genera genomics phylostratigraphy

stable DOI Paper link Visitors

GenEra

Introduction

GenEra is an easy-to-use and highly customizable command-line tool that estimates gene-family founder events (i.e., the age of the last common ancestor of protein-coding gene families) through the reimplementation of genomic phylostratigraphy (Domazet-Lošo et al., 2007).

As of v1.1.0, users can now use Foldseek to search protein structural predictions against the AlphaFold DB for fast and sensitive structural alignments. Alternatively, the user can choose to perform a reassessment of gene ages by running JackHMMER on top of DIAMOND (be aware, this additional step significantly slows down the analysis).

Precomputed gene ages (or 'phylomaps') made using GenEra or from previous studies using other tools can be found here.

Documentation

We recommend users to consult the GenEra wiki for details on installation (via Conda or Docker), database setup and how to run GenEra, as well as the output files. We also discuss potential downstream analyses that can be performed on the GenEra output.

Please cite the appropriate tools when using the dependencies of GenEra. These citations are valuable in furthering bioinformatics research.

The paper describing the method implemented in GenEra:

Barrera-Redondo, J., Lotharukpong, J.S., Drost, H.G., Coelho, S.M. (2023). Uncovering gene-family founder events during major evolutionary transitions in animals, plants and fungi using GenEra. Genome Biology, 24, 54. https://doi.org/10.1186/s13059-023-02895-z

Acknowledgement

We (Josué Barrera-Redondo, Jaruwatana Sodai Lotharukpong & Hajk-Georg Drost) would like to thank several individuals for making this project possible.

We gratefully thank Susana M. Coelho, the Max Planck Institute for Biology Tübingen and the Max Planck Society for hosting and facilitating this research. We thank Caroline M. Weisman for her helpful comments on how to analyze and interpret HDF probabilities of her software abSENSE. We thank the Max Planck Computing and Data Facility for access to and support of the HPC infrastructure, as well as the BMBF-funded de.NBI Cloud within the German Network for Bioinformatics Infrastructure (de.NBI) (031A532B, 031A533A, 031A533B, 031A534A, 031A535A, 031A537A, 031A537B, 031A537C, 031A537D, 031A538A).

Lastly, we are very grateful to Alice Laigle, Erica Dinatale, Laura Piovani, Michael Borg, Alexandra Dallaire and all the early adopters for their testing and feedback.

Funding

This work was supported by the European Research Council Grant “THETYS” (Grant agreement ID 864038), the Alexander von Humboldt Foundation, the Gordon and Betty Moore Foundation, and the Max Planck Society.