gtonkinhill / panstripe

post processing of bacterial pangenome gene presence/absence matrices
GNU General Public License v2.0
50 stars 6 forks source link

Functional annotation of the exchanged genes #18

Open LucoDevro opened 1 month ago

LucoDevro commented 1 month ago

Hi Gerry,

I get that Panstripe's main goal is to model the gene exchange rates, but would it be straightforward to get an idea of the function of the exchanged genes? I'm thinking of listing annotations of genes with a non-zero chance of being exchanged according to the ASR embedded in Panstripe. (the anc_states variable in the main panstripe.R function). For example, I'd like to get a COG profile of the exchanged genes for each node.

Also, splitting this annotation profile out in gained and lost genes would be interesting as well, but you have already listed this as a possible enhancement (#10).

Thanks, Lucas

gtonkinhill commented 1 month ago

Hi Lucas,

Sorry for the slow reply; things have been quite busy.

I've just pushed a change that should allow you to retrieve the branch-level gene ancestral state matrix. You could then collapse these into COG categories as a post-processing step.

Alternatively, you could subset the gene presence/absence matrix into COG categories and run Panstripe separately on each.

LucoDevro commented 3 weeks ago

Hi Gerry,

Oh thanks, that was an easy fix. Could have done this as well. Still a pity that it doesn't distinguish between gains and losses.

In the meantime, I've been tinkering with Csurös' Count tool, which uses Wagner parsimony. Your implementation of Sankoff's algorithm, is that basically the same as Wagner's, as you hardcode that all transition costs are equal and that there are only 2 possible states?