renaud / neuroNER

named entity recognizer for neuronal cells, based on UIMA Ruta rules
GNU Lesser General Public License v3.0
7 stars 8 forks source link

more effort to normalize brain regions #31

Open stripathy opened 9 years ago

stripathy commented 9 years ago

Use Leon French's codebase to help with this

pointers here from @leonfrench:

Using text mining to link journal articles to neuroanatomical databases. http://www.ncbi.nlm.nih.gov/pubmed/22120205

Here's the supplement: http://www.chibi.ubc.ca/faculty/paul-pavlidis/pavlidis-lab/data-and-supplementary-information/text-mining-of-journal-of-comparative-neurology/

It was all based in RDF for storing the data and the code is on github. A base package would be ubic.pubmedgate.resolve.RDFResolvers.

ubic.pubmedgate.resolve.focusedAnalysis.PrintAndResolveBrainRegions gives a good example. You could just edit that to remove the GATE loading code (I'm guessing you are not using GATE).

https://github.com/leonfrench/public/tree/master/PubMedIDtoGate

PrintAndResolveBrainRegions: https://github.com/leonfrench/public/blob/37b8ac6630bba08da23392136b75c6f74ac82953/PubMedIDtoGate/src/main/java/ubic/pubmedgate/resolve/focusedAnalysis/PrintAndResolveBrainRegions.java

stripathy commented 9 years ago

proposal: add flat lists of region terms back

stripathy commented 9 years ago

wait for leon to provide ontology mappings for these terms in flat lists if recall is still not good, then we can use machine learning models

leonfrench commented 9 years ago

Sorry for the delay. It took awhile to get my code up and running but I managed to roughly extend PrintAndResolveBrainRegions.

I mentioned to Shreejoy that I have extended Allen Mouse brain reference atlas (ABA) to include synonyms in the past but looking at my files, I think I misspoke. It seems I only extended the BAMS lexicon/ontology. What I did was map the ABA terms to the NIFSTD terms then bring the NIF synonym information into the ABA ontology. This crudely added 427 synonyms to the 910 ABA regions.

Also, for ABA, I'm using http://bioportal.bioontology.org/ontologies/ABA-AMB which I think is very old. So you may a newer Allen Mouse Atlas version.

I made a spreadsheet with the resulting mappings - I tried to add it in github (I'm a beginner git[hub] user). The file is "regions_lfrench.mappings.csv".

Some manual curation should be done if you are going to use these mappings. First, we could reduce the one mention to many Allen region mappings (some mentions are mapped to three Allen regions).