Closed Aiswarya-prasad closed 2 years ago
The marker genes are listed here: https://github.com/snayfach/MIDAS/blob/master/midas/build/phyeco.hmm and described in the paper. They are a list of 15 universally distributed, single copied, protein coding gene families. Let me know if you run into any issues with the database building.
I should add - those 15 genes are identified in the genomes you provide as input.
Is this done in a data-dependent way? Meaning that the same genes be searched for in any genome or are they 15 genes that happen to be in the given database? I also find that there is a different number of marker genes for each species-id group.
Each of the 15 genes is represented by a profile HMM and those HMMs are what's searched against a new genome when you do a custom database. All 15 should be found in any new bacterial/archaeal genome, provided they are near-complete.
On Sat, Oct 15, 2022 at 1:08 PM Aiswarya @.***> wrote:
Is this done in a data-dependent way? Meaning that the same genes be searched for in any genome or are they 15 genes that happen to be in the given database? I also find that there is a different number of marker genes for each species-id group.
— Reply to this email directly, view it on GitHub https://github.com/snayfach/MIDAS/issues/131#issuecomment-1279824220, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQBXLO6WVVUH6CVM3XU5ATWDMFK5ANCNFSM6AAAAAAREDGUI4 . You are receiving this because you modified the open/close state.Message ID: @.***>
I am using midas on my custom database comprising my own MAGs. It is not clear to me what criteria are used to select marker genes. Is it just that they should be core (present in all MAGs) of a given species? Are there any relaxations or further contraints to this?