renaud / neuroNER

named entity recognizer for neuronal cells, based on UIMA Ruta rules
GNU Lesser General Public License v3.0
7 stars 8 forks source link

migrate to UBERON for brain regions and synonyms #54

Open stripathy opened 8 years ago

stripathy commented 8 years ago

we can get them here: http://www.ebi.ac.uk/ols/beta/ontologies/uberon

tgbugs commented 8 years ago

In process on this already. Building a mapping to uberon against existing identifiers and will generate a bridge file with the additional synonyms. This doesn't entirely resolve the issues around reconciling the allen brain parcellation scheme and importing/creating a set of 'partially overlaps with' relations to the uberon terms. We have uberon loaded on our scigraph instances (e.g. http://matrix.neuinfo.org:9000/scigraph/vocabulary/term/brain) and I've automated most of the matching process.

stripathy commented 8 years ago

@cmungall and @mellybelly, can you advise on the best way to add/recommend synonyms in batch to multiple terms for UBERON?

Context: @tgbugs has very nicely put together a text file that compares the synonyms for region terms contained in UBERON against the hand-generated list of synonyms that @renaud and I put together (https://github.com/renaud/neuroNER/blob/master/ontology_merge/aba_uberon_syn_review.txt). Specifically, this file was generated by finding UBERON terms with cross refs to the Allen Mouse Brain Atlas and then concatenating this with the synonyms from here: https://github.com/renaud/neuroNER/blob/master/resources/bluima/neuroner/hbp_brainregions_aba-syn.obo)

For the majority of region terms, we'll be happy to simply use the UBERON terms and synonyms (on average, they seem much better that what we've been using previously). But in a few cases I'll want to propose new synonyms for UBERON terms or remove terms if they seem incorrect or not specific enough for our text-mining purposes. For example, in the example below, I think it makes sense to consider removing optic from the list of synonyms for optic nerve:

UBERON:0000941 cranial nerve II MBA:848 optic nerve ------ABA SYNS------ IIn Nerve II Nervus opticus optic nerve second cranial nerve -----UBERON SYNS----- 02 optic nerve 2n CN-II cranial II nerve II nervus opticus nervus opticus [II] optic optic II optic II nerve optic nerve optic nerve [II] second cranial nerve

mellybelly commented 8 years ago

are you using the synonym types? Optic is not likely to be an exact synonym of anything, and you can likely ignore non-exact types. see: https://github.com/obophenotype/uberon/wiki/Using-uberon-for-text-mining

On Feb 15, 2016, at 10:40 AM, Shreejoy Tripathy notifications@github.com<mailto:notifications@github.com> wrote:

@cmungallhttps://github.com/cmungall and @mellybellyhttps://github.com/mellybelly, can you advise on the best way to add/recommend synonyms in batch to multiple terms for UBERON?

Context: @tgbugshttps://github.com/tgbugs has very nicely put together a text file that compares the synonyms for region terms contained in UBERON against the hand-generated list of synonyms that @renaudhttps://github.com/renaud and I put together (https://github.com/renaud/neuroNER/blob/master/ontology_merge/aba_uberon_syn_review.txt). Specifically, this file was generated by finding UBERON terms with cross refs to the Allen Mouse Brain Atlas and then concatenating this with the synonyms from here: https://github.com/renaud/neuroNER/blob/master/resources/bluima/neuroner/hbp_brainregions_aba-syn.obo)

For the majority of region terms, we'll be happy to simply use the UBERON terms and synonyms (on average, they seem much better that what we've been using previously). But in a few cases I'll want to propose new synonyms for UBERON terms or remove terms if they seem incorrect or not specific enough for our text-mining purposes. For example, in the example below, I think it makes sense to consider removing optic from the list of synonyms for optic nerve:

UBERON:0000941 cranial nerve II MBA:848 optic nerve ------ABA SYNS------ IIn Nerve II Nervus opticus optic nerve second cranial nerve -----UBERON SYNS----- 02 optic nerve 2n CN-II cranial II nerve II nervus opticus nervus opticus [II] optic optic II optic II nerve optic nerve optic nerve [II] second cranial nerve

— Reply to this email directly or view it on GitHubhttps://github.com/renaud/neuroNER/issues/54#issuecomment-184342217.

Dr. Melissa Haendel

Associate Professor Ontology Development Group, OHSU Library www.ohsu.edu/library/ontologyhttp://www.ohsu.edu/library/ontology Department of Medical Informatics and Clinical Epidemiology Oregon Health & Science University haendel@ohsu.edumailto:haendel@ohsu.edu skype: melissa.haendel 503-407-5970 Appointments: Shanez De Silva desilva@ohsu.edumailto:desilva@ohsu.edu

cmungall commented 8 years ago

Note we already have mappings to ABA. The mappings to the original (adult mouse) ABA can be found as cross-references using the prefix ABA. When Allen expanded to include others atlases, we created new mappings with new prefixes to better reflect the content:

The mappings are available as either xrefs or as logical bridging axioms

See: https://github.com/obophenotype/uberon/issues/609

tgbugs commented 8 years ago

@mellybelly I pulled everything that we map to synonym in our scigraph instance, so there will be broad synonyms in there as well. @stripathy I can get you a version with more accurate mappings if needs be.

@cmungall Thanks, I'll take a look under ABA in xrefs, didn't see them when I took a quick look.

cmungall commented 8 years ago

are you using the synonym types?

I would strongly recommend using synonym scopes. In general we are quite liberal with non-exact synonyms, but very careful with EXACT. See the doc @mellybelly references

Optic is not likely to be an exact synonym of anything, and you can likely ignore non-exact types. see: https://github.com/obophenotype/uberon/wiki/Using-uberon-for-text-mining

Correct, we currently have "optic" as a related synonym, with provenance to TAO:0000435 (which itself got it from ZFA). We're open to removing some of these, but our preferred strategy is to add metadata that weakens the synonym, allowing NER tools to adopt a range of strategies

stripathy commented 8 years ago

@stripathy I can get you a version with more accurate mappings if needs be.

Yes please. It'd be helpful to know which UBERON synonys are exact, broad, etc so I can make a decision on what we should use for text-mining.

Thanks for clarifying @cmungall and @mellybelly, I'm happy to take on the role of going through the ABA MBA terms and synonyms and making a judgement call on which are good enough for NER and weakening/removing synonyms as necessary.

cmungall commented 8 years ago

@tgbugs - you should probably use the MBA xrefs, since this is effectively a 'new' version of the original ABA. (yes, it's not good practice to version in this way, but given the differences between them, and the disappearance of many IDs, and the persistence of the original ABA in sources like neurolex, I thought it good to keep them around as a record. Really we should have the allen artefacts versioned according to our standards in GitHub; they are somewhat in the source-ontologies dir in the uberon repo)

(EDIT: oh I see you meant the MBA ones, I'll leave this comment here though just so we're clear)

tgbugs commented 8 years ago

@cmungall, a broader question, do you recommend using different ObjectProperties to represent different synonym scopes or is it possible to use annotations on the edges themselves (e.g. [ rdf:type owl:Axiom ; owl:annotatedSource obo:UBERON_5111983 ; owl:annotatedProperty oboInOwl:hasRelatedSynonym; owl:annotatedTarget "manual digit 8"^^xsd:string ; oboInOwl:hasDbXref <http://orcid.org/0000-0002-6601-2165> ; oboInOwl:hasSynonymType core2:COMPARATIVE_PREFERRED] .) but include a myont:synonymScope myont:broad annotation, or does that make it hard to filter (for example in scigraph)?

@stripathy I'll push updated versions directly.

mellybelly commented 8 years ago

@tgbugs I would not remove synonyms that are not EXACT, but rather type them in SciGraph. @jnguyenx do we have doc on this?

@stripathy you can look very easily in the Uberon ontology file to see the synonym types. This is especially easy to see in the OBO format. See http://purl.obolibrary.org/obo/uberon/ext.owl Here is an example:

[Term] id: UBERON:0000941 name: cranial nerve II def: "Cranial nerve fiber tract which is comprised of retinal ganglion cell axons running posterior medially towards the optic chiasm, at which some of the axons cross the midline and after which the structure is termed the optic tract. Transmits visual information from the retina to the brain[ZFA]." [ISBN:0471209627, ISBN10:0471888893] subset: efo_slim subset: pheno_slim subset: uberon_slim subset: vertebrate_core synonym: "02 optic nerve" EXACT [AAO:0010345] synonym: "2n" BROAD ABBREVIATION [http://uri.neuinfo.org/nif/nifstd/birnlex_1640, NIFSTD:NeuroNames_abbrevSource] synonym: "CN-II" RELATED [ZFA:0000435] synonym: "cranial II" EXACT [] synonym: "nerve II" RELATED [NeuroNames:289] synonym: "nervus opticus" EXACT [] synonym: "nervus opticus [II]" EXACT LATIN [FMA:50863, FMA:TA] synonym: "optic" RELATED [TAO:0000435] synonym: "optic II" EXACT [EHDAA2:0001313] synonym: "optic II nerve" EXACT [EHDAA2:0001313] synonym: "optic nerve" BROAD SENSU [FMA:50863, ZFA:0000435] synonym: "optic nerve [II]" EXACT [] synonym: "second cranial nerve" EXACT [] xref: :C12761 xref: AAO:0010345 xref: BAMS:2n xref: BAMS:IIn xref: BAMS:nII xref: birnlex:1640 xref: CALOHA:TS-0713 xref: DHBA:15544 xref: EFO:0004258 xref: EHDAA:6788 xref: EHDAA2:0001313 xref: EMAPA:17575 xref: EMAPA:17846 xref: FMA:50863 xref: GAID:831 xref: HBA:9307 xref: http://braininfo.rprc.washington.edu/centraldirectory.aspx?ID=289 {source="NIFSTD:birnlex_1640"} xref: http://linkedlifedata.com/resource/umls/id/C0029130 xref: http://www.snomedbrowser.com/Codes/Details/180938001 xref: MA:0001097 xref: MBA:848 xref: MESH:D009900 xref: OpenCyc:Mx4rvVjLm5wpEbGdrcN5Y29ycA xref: Optic:nerve xref: TAO:0000435 xref: UMLS:C0029130 {source="NIFSTD:birnlex_1640"} xref: VHOG:0000543 xref: XAO:0000188 xref: ZFA:0000435 is_a: UBERON:0011215 ! central nervous system cell part cluster is_a: UBERON:0034713 ! cranial neuron projection bundle relationship: develops_from UBERON:0003902 ! retinal neural layer relationship: part_of UBERON:0001017 ! central nervous system relationship: part_of UBERON:0002104 ! visual system property_value: editor_note "Do not classify under 'cranial nerve', as this is not a true nerve - should be classified as evaginated sensory afferent[ISBN10:0471888893]" xsd:string property_value: external_definition "A collection of nerve cells that project visual information from the eyes to the brain. (Source: BioGlossary, www.Biology-Text.com)[TAO]" xsd:string property_value: external_definition "Fibrous, somatic sensory element covered by a fibrous connective-tissue sheath and is continuous with the layer of nerve cells on the inner surface of the eye.[AAO]" xsd:string property_value: external_ontology_notes "(relaion to eye): MA, XAO, AAO and BTO consider this part of the eye. This is in contrast to GO, FMA, EHDAA2 (FMA has a class 'intra-ocular part of optic nerve' which represents the region of overlap). Relation to brain: part of diencephalon in EHDAA2, ZFA. In NIF, has the optic nerve root as part, which is a feature part of the diencphalon" xsd:string property_value: homology_notes "(...) an essentially similar sequence of events occurs during the embryonic development of the vertebrate eye. The eye initially develops as a single median evagination of the diencephalon that soon bifurcates to form the paired optic vesicles. As each optic vesicle grows towards the body surface, its proximal part narrows as the optic stalk, and its distal part invaginates to form a two-layered optic cup (reference 1); The (optic) stalk persists as the optic nerve (reference 2).[well established][VHOG]" xsd:string

cmungall commented 8 years ago

@tgbugs we treat each scope as a distinct annotation property (all as sub annotation properties of a general has-synonym property). On top of this, each assertion can be further annotated with additional information, including an open-ended list of "types" (we use this term as distinct from "scope" here, it's confusing...) such as ABBREVIATION.

It would be great to have a bit better support in SciGraph, both for retrieval, and in the annotator. The challenge is that properties in neo4j are simple and can't be easily annotated. I thought we had a ticket, but the closest we have is https://github.com/SciGraph/SciGraph/issues/123

tgbugs commented 8 years ago

@cmungall Thanks. the reason I have been hesitant to go that route is exactly because of the lack of easy scigraph support for synonym granularity (not a very good reason).

With regard to ABA:

I am working on the equivalent of your obo over here. It will replace what we have in neurolex for the ABA parcellation (lots of neuro groups use only MBA and nothing else, so we need to have those identifiers in as first class entities). I'm managing provenance by using the original IRIs (might lead to some confusion during curie expansion down the road).

Ultimately what I am going to do is wipe nifga completely (deprecate everything) and port our brain partonomy over to an uberon bridge (see SciCrunch/NIF-Ontology/issues/35). What I do not have are your deprecated classes (altids?) which seem to be a bit different from what I see in neurolex, I don't see the abbreviations that you are using anywhere in our stuff. I'm extracting all of neurolex, curating it, and porting it back into nifstd, so if we have identifiers that are lingering I will probably deprecate them directly or just set it up so that their old iris redirect to the updated entities. Once that is done you can hopefully drop the old ABA stuff if you want.

tgbugs commented 8 years ago

@stripathy update to syn review was pushed.

cmungall commented 8 years ago

ABA - when you say original URIs, do you mean original as in the ones Allen assigned, the ones Neurolex assigned, or the ones we assigned in Uberon?

Not totally sure which deprecated classes you mean: you mean the ones that were lost between the two versions of ABA?

Curious as to your neurolex extraction procedure. I always found it v difficult to get things out. https://github.com/cmungall/nlx-pl (in prolog...)

renaud commented 8 years ago

prolog FTW!

tgbugs commented 8 years ago

@cmungall original as in the ones that Allen assigned. For deprecated yes, the ones lost between the two versions, though I haven't checked them thoroughly.

I'm using the built in mediawiki 'ask' query to pull stuff out http://neurolex.org/wiki/Special:Ask by supplying all the property types. The only problem is that the offset and number of results are limited so I can't get more than 10k results, but I'm looking into fixing that. You can see the code I use here.