prefixcommons / biocontext

JSON-LD Contexts for Bioinformatics Data
22 stars 18 forks source link

Unbounded 'prefix' variants declared locally by KEGG #7

Open jmcmurry opened 8 years ago

jmcmurry commented 8 years ago

KEGG's gene ids already contain a colon; however, their prefixes correspond to virtually innumerable species.

Normally, I'd recommend respecting the prefix declared by a provider, however, in this Kegg Gene case it would gobble up too many prefixes, many of which may already be in use by others.

Options are:

1) Live with double prefixes eg. KEGG:hsa:6469 2) Try to enumerate the prefixes of all known species and declare them as appended to 'KEGG' eg. KEGG-hsa, KEGG-ptr, KEGG-pps etc etc 3) Convince KEGG's resolver to honor underscore-delimited versions instead 4) Barring 3, reroute to someone else willing to parse the local ID

@cmungall thoughts?

See for example here.

hsa:6469 ptr:743371 pps:100995872 ggo:101146888 pon:100461003 nle:100582247 mcc:716553 mcf:102120970 rro:104666156 cjc:100395600 mmu:20423 mmu:320038 rno:29499 cge:100773402 ngi:103745341 hgl:101719896 ocu:100352774