josuebarrera / GenEra

genEra is a fast and easy-to-use command-line tool that estimates the age of the last common ancestor of protein-coding gene families.
GNU General Public License v3.0
46 stars 6 forks source link

May I use GenEra to study the genomic phylostratigraphy of Mycobacterium tuberculosis? #2

Closed TaoCheng98 closed 2 years ago

TaoCheng98 commented 2 years ago

Hi,

GenEra is so great that I would like to use it to study the genomic phylostratigraphy of Mycobacterium tuberculosis.

GenEra appears to have been designed for use in eukaryotes, but I still wonder if GenEra can be used in prokaryotes?

best.

Chengtao

josuebarrera commented 2 years ago

Dear Chengtao, Thank you for your interest in using GenEra! Technically, the method that is implemented in GenEra can be used to analyze any organism. Nonetheless, the software uses the NCBI NR database by default. The eukaryote genomes on the NR contain a considerable amount of prokaryote contamination, which should not have any effect on phylostratigraphy in eukaryotes (if a matching gene is found in a bacterial contaminant, chances are that the gene will be traced back to LUCA either way). However, this poses a problem for prokaryote phylostratigraphy, as some matching eukaryote sequences might end up being bacterial contamination. What I suggest is for you to use a prokaryote database that can be traced back to the NCBI taxonomy as the default database (e.g., Refseq for prokaryotes) and then feed GenEra with a selected list of high-quality eukaryote proteomes, such as the NCBI landmark genomes (https://blast.ncbi.nlm.nih.gov/smartblast/smartBlast.cgi?CMD=Web&PAGE_TYPE=BlastDocs) by using the -a flag, alongside any other proteomes that are of interest for your analysis. You can always contact me through my email (josue.barrera@tuebingen.mpg.de) if you need any help with setting up the database. Best, Josué.