obophenotype / uberon

An ontology of gross anatomy covering metazoa. Works in concert with https://github.com/obophenotype/cell-ontology
http://obophenotype.github.io/uberon/
Other
132 stars 29 forks source link

Deconvolute non-standard prefixes #2205

Open cthoyt opened 2 years ago

cthoyt commented 2 years ago

The following prefixes show up in various places in UBERON but they are not in the Bioregistry, based on the OQUAT analysis in https://biopragmatics.github.io/oquat/unknowns/source/uberon and https://biopragmatics.github.io/oquat/invalids/source/uberon:

prefix count example_node example_val
OBOL 3401 http://purl.obolibrary.org/obo/UBERON_0000031 OBOL:automatic
GAID 814 http://purl.obolibrary.org/obo/UBERON_0000002 GAID:376
PHENOSCAPE 281 http://purl.obolibrary.org/obo/UBERON_4200008 PHENOSCAPE:wd
BM 246 http://purl.obolibrary.org/obo/UBERON_0000007 BM:Die-Hy-HY
FBC 121 http://purl.obolibrary.org/obo/UBERON_0000122 FBC:DOS
UBERONTEMP 105 http://purl.obolibrary.org/obo/UBERON_0016929 UBERONTEMP:0ea3066e-0c22-417b-8ac4-91c2aacba792
GOC 70 http://purl.obolibrary.org/obo/UBERON_0000017 GOC:GO
ABA 62 http://purl.obolibrary.org/obo/UBERON_0000955 ABA:Brain
UBERONREF 45 http://purl.obolibrary.org/obo/UBERON_0000075 UBERONREF:0000003
MURDOCH 38 http://purl.obolibrary.org/obo/UBERON_0011472 MURDOCH:2183
WikipediaVersioned 31 http://purl.obolibrary.org/obo/UBERON_8410000 WikipediaVersioned:Duodenojejunal_flexure&oldid=937307798
BSA 26 http://purl.obolibrary.org/obo/UBERON_0000020 BSA:0000121
FEED 20 http://purl.obolibrary.org/obo/UBERON_0001572 FEED:rd
Dorlands_Medical_Dictionary 16 http://purl.obolibrary.org/obo/UBERON_0000313 Dorlands_Medical_Dictionary:MerckSource
ANISEED 13 http://purl.obolibrary.org/obo/UBERON_0000160 ANISEED:1235303
OGES 13 http://purl.obolibrary.org/obo/UBERON_0000068 OGES:000022
NominaAnatomicaVeterinaria 12 http://purl.obolibrary.org/obo/UBERON_0001451 NominaAnatomicaVeterinaria:2005
LG 11 http://purl.obolibrary.org/obo/UBERON_0004889 LG:0012616
OldNeuroNames 9 http://purl.obolibrary.org/obo/UBERON_0002575 OldNeuroNames:-1761421113
BILS 9 http://purl.obolibrary.org/obo/UBERON_0000105 BILS:0000105
BilaDO 9 http://purl.obolibrary.org/obo/UBERON_0000066 BilaDO:0000004
BRAINSPAN 8 http://purl.obolibrary.org/obo/UBERON_0014736 BRAINSPAN:BRAINSPAN
NIFSTD_RETIRED 8 http://purl.obolibrary.org/obo/UBERON_0000966 NIFSTD_RETIRED:birnlex_1156
Geisha 7 http://purl.obolibrary.org/obo/UBERON_0003052 Geisha:syn
WikipediaCategory 7 http://purl.obolibrary.org/obo/UBERON_0000474 WikipediaCategory:Female_reproductive_system
XtroDO 7 http://purl.obolibrary.org/obo/UBERON_0000066 XtroDO:0000084
Bgee 5 http://purl.obolibrary.org/obo/UBERON_0018241 Bgee:AN
XB 5 http://purl.obolibrary.org/obo/UBERON_0003056 XB:curator
NeuroNamesCNID 5 http://purl.obolibrary.org/obo/UBERON_0015510 NeuroNamesCNID:177
BrainInfo 4 http://purl.obolibrary.org/obo/UBERON_8440010 BrainInfo:2102
NIF 4 http://purl.obolibrary.org/obo/UBERON_0009630 NIF:NIF
DHB 3 http://purl.obolibrary.org/obo/UBERON_0002739 DHB:MD
J 3 http://purl.obolibrary.org/obo/UBERON_0002233 J:77634
PhenoscapeRCN 3 http://purl.obolibrary.org/obo/UBERON_0012260 PhenoscapeRCN:Oct2012
CUMBO 2 http://purl.obolibrary.org/obo/UBERON_0001020 CUMBO:CUMBO
INCF 2 http://purl.obolibrary.org/obo/UBERON_0001880 INCF:Seattle_mtg_2010
MorphoBank 2 http://purl.obolibrary.org/obo/UBERON_0013614 MorphoBank:177
NominaAnatomica 2 http://purl.obolibrary.org/obo/UBERON_0010356 NominaAnatomica:NA
Obol 2 http://purl.obolibrary.org/obo/UBERON_0003281 Obol:obol
PAPUB 2 http://purl.obolibrary.org/obo/UBERON_2001162 PAPUB:0000142
Phenoscape 2 http://purl.obolibrary.org/obo/UBERON_4000164 Phenoscape:PM
Swanson 2 http://purl.obolibrary.org/obo/UBERON_0001893 Swanson:2004
NIF_Organism 2 http://purl.obolibrary.org/obo/UBERON_0007221 NIF_Organism:birnlex_695
NOID 2 http://purl.obolibrary.org/obo/UBERON_0018367 NOID:1
OGEM 2 http://purl.obolibrary.org/obo/UBERON_0000307 OGEM:000006
BioMart 1 http://purl.obolibrary.org/obo/UBERON_0000363 BioMart:BioMart
CHECKME 1 http://purl.obolibrary.org/obo/UBERON_0003997 CHECKME:CHECKME
Giesha 1 http://purl.obolibrary.org/obo/UBERON_0005421 Giesha:syn
Hymans 1 http://purl.obolibrary.org/obo/UBERON_0010260 Hymans:Hymans
MTB 1 http://purl.obolibrary.org/obo/UBERON_0002145 MTB:379
AOO 1 http://purl.obolibrary.org/obo/UBERON_3000406 AOO:LAP
ASD 1 http://purl.obolibrary.org/obo/UBERON_3010449 ASD:BJB
Fast_Health_Medical_Dictionary 1 http://purl.obolibrary.org/obo/UBERON_0008230 Fast_Health_Medical_Dictionary:http://www.fasthealth.com/dictionary/
NCBI 1 http://purl.obolibrary.org/obo/UBERON_0001471 NCBI:matt
OMD 1 http://purl.obolibrary.org/obo/UBERON_0003075 OMD:neural+plate
PATOC 1 http://purl.obolibrary.org/obo/UBERON_0005160 PATOC:MAH
PLB 1 http://purl.obolibrary.org/obo/UBERON_0013730 PLB:plb
Renal_Physiology 1 http://purl.obolibrary.org/obo/UBERON_0008404 Renal_Physiology:Section_7
WA 1 http://purl.obolibrary.org/obo/UBERON_0003049 WA:dh
Wiktionary 1 http://purl.obolibrary.org/obo/UBERON_7500117 Wiktionary:opisthocranion
bgee 1 http://purl.obolibrary.org/obo/UBERON_0036219 bgee:ANN
ref 1 http://purl.obolibrary.org/obo/UBERON_0004870 ref:Stedmans
DrerDO 1 http://purl.obolibrary.org/obo/UBERON_0004707 DrerDO:0000052
MAP 1 http://purl.obolibrary.org/obo/UBERON_0001155 MAP:0000001
TA2 1 http://purl.obolibrary.org/obo/UBERON_8410000 TA2:2952
Talairach 1 http://purl.obolibrary.org/obo/UBERON_0035933 Talairach:1047

Generated by the following code:

from tabulate import tabulate
from collections import Counter

import requests

def main():
    url = "https://raw.githubusercontent.com/biopragmatics/oquat/main/results/uberon.json"
    data = requests.get(url).json()

    counter = Counter()
    examples = {}
    for data in data["results"].values():
        for key in ["synonym_pack", "prov_pack", "xref_pack"]:
            for prefix, uri_to_value_dict in data[key]["unknown_prefixes"].items():
                counter[prefix] += len(uri_to_value_dict)
                examples[prefix] = list(uri_to_value_dict.items())[0]

    rows = [(prefix, count, *examples[prefix]) for prefix, count in counter.most_common()]

    print(
        tabulate(
            rows, headers=["prefix", "count", "example_node", "example_val"], tablefmt="github"
        )
    )

if __name__ == "__main__":
    main()

Any help figuring out what these are and how they're used would be appreciated!

patrick-lloyd-ray commented 2 years ago

I think I might know some of these, as they look like Allen Institute-related prefixes:

DHBA = developing human brain atlas (https://github.com/obophenotype/uberon/blob/master/source-ontologies/allen-dhba.obo) HBA = human brain atlas (https://github.com/obophenotype/uberon/blob/master/source-ontologies/allen-hba.obo) DMBA = developing mouse brain atlas (https://github.com/obophenotype/uberon/blob/master/source-ontologies/allen-dmba.obo) MBA = mouse brain atlas (https://github.com/obophenotype/uberon/blob/master/source-ontologies/allen-mba.obo) ABA = Allen Brain Atlas -- this is the name for all of the the brain atlases at Allen (https://en.wikipedia.org/wiki/Allen_Brain_Atlas), see also https://github.com/obophenotype/ABA_Uberon PBA = primate brain atlas (non-human) (https://github.com/obophenotype/uberon/blob/master/source-ontologies/allen-pba.obo)

I don't know the history of their usage very well (these prefixes pre-date my working at the Allen Institute), but I know they were used in some mapping projects and we still have a need for them. I think I also recognize these two:

NLX = neurolex nlx_subcell = neurolex subcellular structure

I don't know how these were used, but @tgbugs would probably know.

cthoyt commented 2 years ago

@patrick-lloyd-ray that's an excellent start! thanks so much. For anyone who might be able to provide more contexts, I'm also looking for web references describing what these things are and if possible, some links to a list of the terms that go in each or ontology files if they exist

tgbugs commented 2 years ago

Here are the mappings that I have.

@prefix MBA: <http://api.brain-map.org/api/v2/data/Structure/> .
@prefix HBA: <http://api.brain-map.org/api/v2/data/Structure/> .
@prefix DHBA: <http://api.brain-map.org/api/v2/data/Structure/> .
@prefix DMBA: <http://api.brain-map.org/api/v2/data/Structure/> .
@prefix NLX: <http://uri.neuinfo.org/nif/nifstd/nlx_> .
@prefix NLXSUB: <http://uri.neuinfo.org/nif/nifstd/nlx_subcell_> .

ABA refers to the very first version of the terminology that was modelled in owl using subClassOf instead of partOf. https://bioportal.bioontology.org/ontologies/ABA-AMB I think the mapping is

@prefix ABA: <http://mouse.brain-map.org/atlas/index.html#> .
cmungall commented 2 years ago

Thanks @tgbugs and @patrick-lloyd-ray this is correct

I think we can replace all ABA xrefs and use MBA instead

As far as I know we don't have official prefixes for the allen atlases yet, @dosumis? As soon as we get these we should add to bioregistry. But until then these xrefs are vital.

cmungall commented 2 years ago

@tgbugs we of course wouldn't have NLXSUB in uberon but GO uses xrefs like NIF_Subcellular:nlx_subcell_100315 - if NLXSUB is your preferred prefix let's register it and use in in GO!

shawntanzk commented 2 years ago

Plan:

1) Start with creating a table

prefix source

2) remove those we don't want

3) register with bioregistry those that stay in (if not use bioregistry prefix)

cthoyt commented 2 years ago

I added several Allen Brain Atlas prefixes suggested by @patrick-lloyd-ray in https://github.com/biopragmatics/bioregistry/commit/1cfd0ff4788940974d20548228c2c877d0d7df55 (though not ABA since Chris said these should be upgraded)

They will also have a collection at https://bioregistry.io/collection/0000005 that will go live with the nightly update tonight

cthoyt commented 2 years ago

@tgbugs we of course wouldn't have NLXSUB in uberon but GO uses xrefs like NIF_Subcellular:nlx_subcell_100315 - if NLXSUB is your preferred prefix let's register it and use in in GO!

I am very worried about stuff like this because of the amount of redundant prefix usage. Why isn't this just NIF_Subcellular:00315?

shawntanzk commented 2 years ago

Thanks heaps @cthoyt - sorry I havent managed to get around to doing this, really appreciate the help! :)

dosumis commented 2 years ago

As far as I know we don't have official prefixes for the allen atlases yet, @dosumis? As soon as we get these we should add to bioregistry. But until then these xrefs are vital.

I guess that means trying to request obolibrary status for the ontologised versions of ABA structuregraphs. As these are unlikely to fulfill QC required (e.g. they will never have text defs), isn't this unlikely?

cmungall commented 2 years ago

I think we can add to bioregistry independently (I think the URLs resolve..?)

but ideally they could be regularly deposited on something like OLS/BP/Ontobee too

cthoyt commented 2 years ago

@cmungall yes already done

patrick-lloyd-ray commented 2 years ago

I guess that means trying to request obolibrary status for the ontologised versions of ABA structuregraphs. As these are unlikely to fulfill QC required (e.g. they will never have text defs), isn't this unlikely?

I'm open to getting these to meet QC and in obolibrary, if there is community interest.

matentzn commented 2 years ago

You can secure a prefix in bioregistry w/o being an ontology and/or in OBO!

tgbugs commented 2 years ago

I am very worried about stuff like this because of the amount of redundant prefix usage. Why isn't this just NIF_Subcellular:00315?

Because the expansion is completely different.

@prefix NIFSUB: <http://ontology.neuinfo.org/NIF/BiomaterialEntities/NIF-Subcellular.owl#> .

The NIF_Subcellular prefix expands to ancient fragment based identifiers that cannot be resolved by the server (make bad assumptions about the design of the document and system that hosts the ontology ids) and which redirect via a bit of javascript to a proper resolver.

github-actions[bot] commented 1 year ago

This issue has not seen any activity in the past 6 months; it will be closed automatically one year from now if no action is taken.

cthoyt commented 1 year ago

Note that the ontology quality assessment toolkit site is now auto-generated weekly. The most up-to-date version for UBERON is at https://biopragmatics.github.io/oquat/unknowns/source/uberon

matentzn commented 1 year ago

@anitacaron my advice when you do a push on Uberon next time, just drop all the references oquat lists as 5 or less. This will clean up the situation significantly. UBERONREF is silly as well.

github-actions[bot] commented 1 year ago

This issue has not seen any activity in the past 6 months; it will be closed automatically one year from now if no action is taken.

anitacaron commented 9 months ago

@cthoyt can we please get an updated version of the table in the description?

cthoyt commented 9 months ago

@anitacaron yes, I updated the OQUAT website, added the code that generates the table, and updated the table at the top of the issue. FYI, the latest available JSON version of the ontology is from end of october

Ref:

from tabulate import tabulate
from collections import Counter

import requests

def main():
    url = "https://raw.githubusercontent.com/biopragmatics/oquat/main/results/uberon.json"
    data = requests.get(url).json()

    counter = Counter()
    examples = {}
    for data in data["results"].values():
        for key in ["synonym_pack", "prov_pack", "xref_pack"]:
            for prefix, uri_to_value_dict in data[key]["unknown_prefixes"].items():
                counter[prefix] += len(uri_to_value_dict)
                examples[prefix] = list(uri_to_value_dict.items())[0]

    rows = [(prefix, count, *examples[prefix]) for prefix, count in counter.most_common()]

    print(
        tabulate(
            rows, headers=["prefix", "count", "example_node", "example_val"], tablefmt="github"
        )
    )

if __name__ == "__main__":
    main()