renaud / neuroNER

named entity recognizer for neuronal cells, based on UIMA Ruta rules
GNU Lesser General Public License v3.0
7 stars 8 forks source link

variants of allen brain regions #57

Open tgbugs opened 8 years ago

tgbugs commented 8 years ago

exact exact + related exact + related + narrow exact + related + narrow + broad

tgbugs commented 8 years ago

@stripathy done, they are in the ontology_merge folder Did these in e, n, b, r order.

stripathy commented 8 years ago

My sense is that Uberon uses broad as the most broad category, but terms that have the "related" synonym seem to be good synonym for each term. @tgbugs Can I get a version of this file with just exact, narrow, and related, but not broad? See example below.

UBERON:0014915 genu of facial nerve MBA:1116 genu of the facial nerve ------ABA SYNS------ Genu internum n. faciales Genu nervi facialis Nervus facialis first genu gVIIn genu of facial nerve genu of the facial nerve internal genu internal genu of facial nerve internal genu of the facial nerve -----UBERON SYNS----- --Broad-- first genu --Exact-- genu nervi facialis --Narrow-- internal genu of facial nerve --Related-- Nervus facialis (Genu) first genu genu internum n. faciales genu nervi facialis genu of facial nerve genu of the facial nerve internal genu internal genu of facial nerve internal genu of the facial nerve

stripathy commented 8 years ago

Here's what I need to do to evaluate UBERON syns vs ABA syns:

  1. rename synonym file as appropropriate file name so sherlok code knows how to find it (maybe also rename identifiers from MBA to ABA for downstream code?)
  2. push new syn file to neuroNER github repo
  3. clean sherlok resources so new synonym file is downloaded
  4. run sherlok to annotate neuron name terms
  5. evaluate coverage of sherlok mappings using neuroNER code (i.e., run normalizing code to clean annotations and count how many region terms are in the UNKN_REGION "ontology" vs assigned to an ABA region). How many terms remain missing since they aren't mapped to any ontology?
tgbugs commented 8 years ago

@stripathy here it is. I'll fix the order for the obo files tomorrow too.

stripathy commented 8 years ago

Sorry - I was hoping to get the .obo version of this file.

On Thu, Mar 3, 2016 at 12:43 AM Tom Gillespie wrote:

@stripathy here it is .

— Reply to this email directly or view it on GitHub

Shreejoy Tripathy Post-Doctoral Researcher Department of Psychiatry University of British Columbia

stripathy commented 8 years ago

@tgbugs - Can I also get a version of the .obo file for the entire UBERON anatomy ontology with all available synonyms? I'm finding that some terms used in the literature (e.g., basal forebrain) don't have 1-to-1 mappings with Allen MBA structures so aren't in the lists that you've shared, but they do have appropriate entities in UBERON.

In other words, my sense is that uberon is a great neuroanatomy ontology in that it has most terms that are actively used within the literature, but it doesn't always have crossrefs to a brain partonomy that we may care about, like Allen adult MBA.

tgbugs commented 8 years ago

@stripathy yep, I'll put one together.

tgbugs commented 8 years ago

@stripathy here is the enr syns file.

tgbugs commented 8 years ago

@stripathy I added 4 versions of uberon that have been stripped of everything except for the id, name, and synonyms. This is all of uberon, if you need just the nervous system I think Chris has a brain slim that I can run it on.

stripathy commented 8 years ago

Using the 1K neuroelectro neuron types as a test set, I ran the neuroNER parsing using the original regions file with the ABA terms and synonyms and with the terms and synonyms from UBERON with all synonyms except broad. Results here:

My sense is that Uberon is pretty great relative to what we were using earlier, but in a few cases the appropriate term is missing a crossref to the MBA mouse brain atlas (examples include amygdala and olfactory bulb), so we'll have to had those in.

renaud commented 8 years ago

All this sounds good news to me. @stripathy do you have some stats on the above csv results (to compare it with the former ABA OBO file)?

stripathy commented 8 years ago

@renaud - no, I never computed any stats on this file

By eye, I could tell that in most cases what we had for ABA was identical for UBERON. UBERON was better in a few cases (had more synonyms). In a few cases where you and I had manually added synonyms to the ABA file, then UBERON didn't have those synonyms.

After discussing with @tgbugs , my plan is as follows: rerun the parsing using all (brain?) UBERON terms, even if they don't have an explicit database crossref to the Allen Brain Mouse Atlas (what we've been using within the /similarity code for region similarity matching). For a few regions we'll have to manually add in database crossrefs to the Allen Mouse brain atlas.

tgbugs commented 8 years ago

@stripathy working on the brain subset now, it will be up shortly.

stripathy commented 8 years ago

@renaud is there a downside to having duplicated synonyms across different terms in the .obo files?

In the uberon brain slim file that @tgbugs provided, here's two different terms with identical synonyms:

[Term] id: UBERON:0002870 name: dorsal motor nucleus of vagus nerve namespace: uberon synonym: "dorsal efferent nucleus of vagus" EXACT [FMA:54592] synonym: "dorsal motor nucleus" RELATED [NeuroNames:755] synonym: "dorsal motor nucleus of the vagus" RELATED [] synonym: "dorsal motor nucleus of the vagus (vagal nucleus)" EXACT [DHBA:10N] synonym: "dorsal motor nucleus of the vagus nerve" RELATED [] synonym: "dorsal motor nucleus of vagus" RELATED [BAMS:10] synonym: "dorsal motor nucleus of vagus nerve" RELATED [] synonym: "dorsal motor nucleus of vagus X nerve" EXACT [MA:0001036] synonym: "dorsal motor vagal nucleus" RELATED [] synonym: "dorsal nucleus of the vagus nerve" RELATED [] synonym: "dorsal nucleus of vagus nerve" RELATED [] synonym: "dorsal vagal nucleus" EXACT [NIFSTD:birnlex_2642] synonym: "dorsal vagal nucleus" RELATED [] synonym: "nucleus alaris" EXACT [NIFSTD:birnlex_2642] synonym: "nucleus alaris (Oertel)" RELATED LATIN [NeuroNames:755] synonym: "nucleus dorsalis motorius nervi vagi" RELATED LATIN [NeuroNames:755] synonym: "nucleus dorsalis nervi vagi" RELATED LATIN [NeuroNames:755] synonym: "nucleus posterior nervi vagi" RELATED [] synonym: "nucleus vagalis dorsalis" RELATED LATIN [NeuroNames:755] synonym: "posterior nucleus of vagus nerve" RELATED [] synonym: "vagus nucleus" RELATED []


[Term] id: UBERON:0011775 name: vagus nerve nucleus namespace: uberon synonym: "nodosal nucleus" RELATED [] synonym: "nucleus of vagal nerve" EXACT [] synonym: "nucleus of vagal X nerve" EXACT [] synonym: "nucleus of vagus nerve" EXACT [FMA:54573] synonym: "nucleus of Xth nerve" EXACT [] synonym: "tenth cranial nerve nucleus" EXACT [FMA:54573] synonym: "vagal nucleus" EXACT [] synonym: "vagal X nucleus" EXACT [] synonym: "vagus nucleus" EXACT []

The first term has a db xref to the Allen adult mouse atlas (MBA) but the second doesn't. I'd prefer to keep both terms in the resource file (Rather than manually go through and remove terms or arbitrarily remove conflicting synonyms across terms). Just like we have multiple resource files for a feature currently, like for regions, in post-processing we can keep terms which have better db xrefs (like if they have a xref to MBA, which is our preferred region ontology).