Open cthoyt opened 2 years ago
I think I might know some of these, as they look like Allen Institute-related prefixes:
DHBA = developing human brain atlas (https://github.com/obophenotype/uberon/blob/master/source-ontologies/allen-dhba.obo) HBA = human brain atlas (https://github.com/obophenotype/uberon/blob/master/source-ontologies/allen-hba.obo) DMBA = developing mouse brain atlas (https://github.com/obophenotype/uberon/blob/master/source-ontologies/allen-dmba.obo) MBA = mouse brain atlas (https://github.com/obophenotype/uberon/blob/master/source-ontologies/allen-mba.obo) ABA = Allen Brain Atlas -- this is the name for all of the the brain atlases at Allen (https://en.wikipedia.org/wiki/Allen_Brain_Atlas), see also https://github.com/obophenotype/ABA_Uberon PBA = primate brain atlas (non-human) (https://github.com/obophenotype/uberon/blob/master/source-ontologies/allen-pba.obo)
I don't know the history of their usage very well (these prefixes pre-date my working at the Allen Institute), but I know they were used in some mapping projects and we still have a need for them. I think I also recognize these two:
NLX = neurolex nlx_subcell = neurolex subcellular structure
I don't know how these were used, but @tgbugs would probably know.
@patrick-lloyd-ray that's an excellent start! thanks so much. For anyone who might be able to provide more contexts, I'm also looking for web references describing what these things are and if possible, some links to a list of the terms that go in each or ontology files if they exist
Here are the mappings that I have.
@prefix MBA: <http://api.brain-map.org/api/v2/data/Structure/> .
@prefix HBA: <http://api.brain-map.org/api/v2/data/Structure/> .
@prefix DHBA: <http://api.brain-map.org/api/v2/data/Structure/> .
@prefix DMBA: <http://api.brain-map.org/api/v2/data/Structure/> .
@prefix NLX: <http://uri.neuinfo.org/nif/nifstd/nlx_> .
@prefix NLXSUB: <http://uri.neuinfo.org/nif/nifstd/nlx_subcell_> .
ABA
refers to the very first version of the terminology that was modelled in owl using subClassOf instead of partOf. https://bioportal.bioontology.org/ontologies/ABA-AMB I think the mapping is
@prefix ABA: <http://mouse.brain-map.org/atlas/index.html#> .
Thanks @tgbugs and @patrick-lloyd-ray this is correct
I think we can replace all ABA xrefs and use MBA instead
As far as I know we don't have official prefixes for the allen atlases yet, @dosumis? As soon as we get these we should add to bioregistry. But until then these xrefs are vital.
@tgbugs we of course wouldn't have NLXSUB in uberon but GO uses xrefs like NIF_Subcellular:nlx_subcell_100315
- if NLXSUB is your preferred prefix let's register it and use in in GO!
Plan:
1) Start with creating a table
prefix | source |
---|
2) remove those we don't want
3) register with bioregistry those that stay in (if not use bioregistry prefix)
I added several Allen Brain Atlas prefixes suggested by @patrick-lloyd-ray in https://github.com/biopragmatics/bioregistry/commit/1cfd0ff4788940974d20548228c2c877d0d7df55 (though not ABA
since Chris said these should be upgraded)
They will also have a collection at https://bioregistry.io/collection/0000005 that will go live with the nightly update tonight
@tgbugs we of course wouldn't have NLXSUB in uberon but GO uses xrefs like
NIF_Subcellular:nlx_subcell_100315
- if NLXSUB is your preferred prefix let's register it and use in in GO!
I am very worried about stuff like this because of the amount of redundant prefix usage. Why isn't this just NIF_Subcellular:00315
?
Thanks heaps @cthoyt - sorry I havent managed to get around to doing this, really appreciate the help! :)
As far as I know we don't have official prefixes for the allen atlases yet, @dosumis? As soon as we get these we should add to bioregistry. But until then these xrefs are vital.
I guess that means trying to request obolibrary status for the ontologised versions of ABA structuregraphs. As these are unlikely to fulfill QC required (e.g. they will never have text defs), isn't this unlikely?
I think we can add to bioregistry independently (I think the URLs resolve..?)
but ideally they could be regularly deposited on something like OLS/BP/Ontobee too
@cmungall yes already done
I guess that means trying to request obolibrary status for the ontologised versions of ABA structuregraphs. As these are unlikely to fulfill QC required (e.g. they will never have text defs), isn't this unlikely?
I'm open to getting these to meet QC and in obolibrary, if there is community interest.
You can secure a prefix in bioregistry w/o being an ontology and/or in OBO!
I am very worried about stuff like this because of the amount of redundant prefix usage. Why isn't this just NIF_Subcellular:00315?
Because the expansion is completely different.
@prefix NIFSUB: <http://ontology.neuinfo.org/NIF/BiomaterialEntities/NIF-Subcellular.owl#> .
The NIF_Subcellular
prefix expands to ancient fragment based identifiers that cannot be resolved by the server (make bad assumptions about the design of the document and system that hosts the ontology ids) and which redirect via a bit of javascript to a proper resolver.
This issue has not seen any activity in the past 6 months; it will be closed automatically one year from now if no action is taken.
Note that the ontology quality assessment toolkit site is now auto-generated weekly. The most up-to-date version for UBERON is at https://biopragmatics.github.io/oquat/unknowns/source/uberon
@anitacaron my advice when you do a push on Uberon next time, just drop all the references oquat lists as 5 or less. This will clean up the situation significantly. UBERONREF is silly as well.
This issue has not seen any activity in the past 6 months; it will be closed automatically one year from now if no action is taken.
@cthoyt can we please get an updated version of the table in the description?
@anitacaron yes, I updated the OQUAT website, added the code that generates the table, and updated the table at the top of the issue. FYI, the latest available JSON version of the ontology is from end of october
Ref:
from tabulate import tabulate
from collections import Counter
import requests
def main():
url = "https://raw.githubusercontent.com/biopragmatics/oquat/main/results/uberon.json"
data = requests.get(url).json()
counter = Counter()
examples = {}
for data in data["results"].values():
for key in ["synonym_pack", "prov_pack", "xref_pack"]:
for prefix, uri_to_value_dict in data[key]["unknown_prefixes"].items():
counter[prefix] += len(uri_to_value_dict)
examples[prefix] = list(uri_to_value_dict.items())[0]
rows = [(prefix, count, *examples[prefix]) for prefix, count in counter.most_common()]
print(
tabulate(
rows, headers=["prefix", "count", "example_node", "example_val"], tablefmt="github"
)
)
if __name__ == "__main__":
main()
The following prefixes show up in various places in UBERON but they are not in the Bioregistry, based on the OQUAT analysis in https://biopragmatics.github.io/oquat/unknowns/source/uberon and https://biopragmatics.github.io/oquat/invalids/source/uberon:
Generated by the following code:
Any help figuring out what these are and how they're used would be appreciated!