microbiomedata / nmdc-server

Data portal client and server for NMDC.
https://data.microbiomedata.org
Other
9 stars 0 forks source link

Data Portal KEGG pathway search issue with pathway has ko prefix #1342

Closed aclum closed 2 months ago

aclum commented 3 months ago

when use the functional search with ko00010 it does populate the term name.

see https://www.genome.jp/kegg/kegg1b.html for expected prefixes.

This term should populate with a name of Glycolysis / Gluconeogenesis

https://www.genome.jp/dbget-bin/www_bget?pathway+ko00010

naglepuff commented 3 months ago

Looks like we ingest those with the map prefix, and this is called out in the KEGG search flyout: image

So searching for map00010 contains this result: image

What's the difference between the ko and map prefix? It looks one is called the image_id and the other is the ko_pathway_id. I did a quick check of the file we use to ingest pathways terms, and for every row in that file, the 5-digit number following those prefixes is the same.

I'm certainly open to making changes here for a better/more correct user experience, I just lack the knowledge about KEGG terms to know what's right and want to make sure that the current functionality is well documented here.

aclum commented 3 months ago

the map prefix is the reference kegg pathway map ec rn ko prefix redirect to the same pathway map but with different color coding.

https://www.genome.jp/pathway/map00010 is the reference pathway for Glycolysis / Gluconeogenesis https://www.genome.jp/pathway/ko00010 is the same pathway (Glycolysis / Gluconeogenesis) but with colored boxes for KEGG terms.

See color coding section of the kegg pathway help information

See this prefix info https://www.genome.jp/kegg/kegg3.html No plans at this time to support organism specific searches so disregard

Notes about how we are resolving KEGG pathways in the data portal & color coding. The registered curies for kegg pathways at bioregistry and identifiers.org redirect to https://www.kegg.jp/entry/$PATHWAY_ID which does not have the fun color coding like https://www.genome.jp/pathway does (nmdc-schema uses https://bioregistry.io/kegg.pathway) cc @turbomam It does appear that the resolver the data portal code is using (https://www.genome.jp/kegg-bin/show_pathway?$PATHWAY_ID) does link to the color coded versions.

To fully support kegg pathways I'd like to see the ec rn ko prefixes supported, you could use the pathway names from the table already being used and the same url to resolve hyperlinks.

If this is possible and you update the allowed format in the search flyout it would be great to lowercase MAP so that we are consistent with the kegg identifiers. So it would be expected format: K00000, M00000, map00000, ko00000, rn00000, ec00000

naglepuff commented 3 months ago

@aclum does this screenshot capture what you would expect for results for a search of ec00010? image

aclum commented 3 months ago

Yes & then the search results should return the same number of samples that a search for map00010

turbomam commented 2 months ago

I support this. How is it coming along? Is this in a similar vein as the request for Pfam clade prefix?

Do you need anything else form me?

aclum commented 2 months ago

@turbomam nothing needed here from you, curie prefix KEGG.PATHWAY in both cases, this ticket is to support different prefixes for the term ID itself. ex support both KEGG.PATHWAY:ko00010 & KEGG.PATHWAY:ec00010