z0on / emapper_to_GOMWU_KOGMWU

A few awk one-liners to extract data tables compatible with GO_MWU and KOGMWU methods out of eggNOG-eMapper output
6 stars 2 forks source link

Extract GO and KOG-class annotations from eggNOG-mapper output

eggNOG-mapper is the method of choice to annotate transcriptomes of non-model organisms.

This tiny repository provides one-liners for extacting specific annotation data (gene names, Gene Ontology [GO], and euKaryotic Orthologous Groups [KOG] ) out of eggNOG-mapper output table.

The goal is to generate annotations compatible with rank-based functional summary methods, KOGMWU and GO_MWU.

See Dixon et al, Science 2015 for examples of applying these methods.

Files included:

Extracting Gene Ontology annotations for GO_MWU:

awk -F "\t" 'BEGIN {OFS="\t" }{print $1,$6 }' Mcavernosa_euk.emapper.annotations | grep GO | perl -pe 's/,/;/g' >Mcavernosa_gene2go.tab

Extracting KOG annotations for KOGMWU:

#  KOG classes (single-letter):
awk -F "\t" 'BEGIN {OFS="\t" }{print $1,$12 }' Mcavernosa_euk.emapper.annotations | grep -Ev "[,#S]" >mc_gene2kogClass1.tab
# converting single-letter KOG classes to text understood by KOGMWU package (must have kog_classes.txt file in the same dir):
awk 'BEGIN {FS=OFS="\t"} NR==FNR {a[$1] = $2;next} {print $1,a[$2]}' kog_classes.txt mc_gene2kogClass1.tab > Mcavernosa_gene2kogClass.tab

Extracting gene names:

awk -F "\t" 'BEGIN {OFS="\t" }{print $1,$13 }' Mcavernosa_euk.emapper.annotations | grep -Ev "\tNA" >Mcavernosa_gene2geneName.tab