obophenotype / ncbitaxon

Build for NCBITaxon
BSD 3-Clause "New" or "Revised" License
25 stars 7 forks source link

What are GC_IDs? #47

Open cthoyt opened 3 years ago

cthoyt commented 3 years ago

Most terms have an xref to a namespace with a prefix GC_ID. Is anyone familiar with what that is or what it abbreviates?

jamesaoverton commented 3 years ago

I have a partial answer. The ncbitacon.owl file is a direct translation of the taxdmp.zip file available here: https://ftp.ncbi.nih.gov/pub/taxonomy/. In that directory is a taxdmp_readme.txt that explains the various fields. "GC" is their abbreviation for "genetic code", and points to a gencode.dmp file that we do not translate. Official NCBI Taxonomy pages include a "Genetic code" field with a link, e.g. https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=9606&lvl=3&lin=f&keep=1&srchmode=1&unlock. That's as much as I know.

cthoyt commented 3 years ago

Thanks @jamesaoverton, that's much appreciated. It's unbelievable how many nomenclatures the NCBI has generated...

cmungall commented 3 years ago

FWIW UMLS doesn't translate this either https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/NCBI/sourcerepresentation.html

I suggest

  1. Register something like NCBI.gc with identifiers.org / n2t.net
  2. Point this at URLs like https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi#SG2 or https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?chapter=tgencodes#SG2
  3. have an annotation/xref pointing to this
  4. (stretch) have some kind of ontological rendering of gencode.dmp (btw, did the file move? I don't see it). E.g
    • taxon has-part (nuclear genome and has-part some translation system GC_ID)
    • GC_ID a SP:codon, label "ATG", starts-with some (adenine and followed-by thymine and ends-with guanine) encodes chebi:methionine]
    • this injects a bunch of blank nodes into the ontology with no real priority use case and would be for the sake of ontological completeness, so YMMV....
cthoyt commented 2 years ago

FYI: This has been registered in the Bioregistry at http://bioregistry.io/registry/gc