qiyunzhu / woltka

Woltka: a versatile meta'omic data classifier
BSD 3-Clause "New" or "Revised" License
67 stars 24 forks source link

KEGG pathways detected are not represented in pathway coverage results #187

Open LLansing opened 11 months ago

LLansing commented 11 months ago

I have generated KEGG KO annotation results (woltka classify) and pathway results (woltka collapse on KO results), and finally pathway coverage results (woltka coverage on KO results with KEGG pathway-to-ko.txt mapping file built from the kegg_query.py helper script).

These steps seemed to have worked, but upon comparing KEGG pathways ("maps"), there are a small number that were present in the pathway count results, but were not represented in the coverage results (OR in the pathway-to-ko.txt file).

All 16 of these discrepant pathways are categorized within KEGG's database as Global and overview maps, Drug resistance: antimicrobial, or Drug resistance: antineoplastic.

Do you know why these categories have been excluded from the kegg_query.py output and therefore the coverage results? I am not claiming there is a bug or problem, but I would like to know why to make sure there isn't anything I'm missing.

qiyunzhu commented 11 months ago

@LLansing Thanks for your interest in the program. To be honest, I don't know the exact answer, but I guess that KEGG might have updated their rules of naming pathways. In the past, there are prefixes like "ko", "map" and "rn". They are almost the same thing. But in some scenarios, one pathway has "map" but not "ko" (for example). I suspect that this discrepancy caused the 16 pathways to miss their KO members. I don't have a robust solution to this. If you are able to manually obtain the membership information from the KEGG website, you may be able to update the database files to include them.