neherlab / pan-genome-analysis

Processing pipeline for pan-genome visulization and exploration
http://pangenome.de
GNU General Public License v3.0
133 stars 37 forks source link

Gene gain/loss, duplication #20

Open mdieser opened 5 years ago

mdieser commented 5 years ago

Hi Richard - hope this is my last question for you. I uploaded my output files to the server. Is there a way to extract the information in the 'gene cluster table' and 'sequence alignment window' to obtain gene duplication, gain, loss events for each strain. I'm interested in making pie charts showing the percent ancestral genes, genes gain, gene loss, and duplication events. How was the geneGainLossEvent.json file in the TestSet folder generated. Would a similar file help me to retrieve the info I am looking for?

Markus

rneher commented 5 years ago

Hi Markus,

I am not fully sure I understand. When you are talking about "gain/loss events for a strain", do you mean all events that happened since the root of the strain tree, or events that happened on the terminal branch of tree that leads to the strain of interest? The geneCluster.json lists duplication and gain/loss events per gene, and for each gene lists in which strains it is present. For each gene cluster there also is a GC_*_patterns.json that contains a key between 0-3 that specifies whether the gene was present, absent, gained or lost on every branch of the core genome tree. But I'd have to look up specifically how this is coded.

best, richard