renaud / neuroNER

named entity recognizer for neuronal cells, based on UIMA Ruta rules
GNU Lesser General Public License v3.0
7 stars 8 forks source link

weird capitalization effects for neuroNER tagging #24

Closed stripathy closed 9 years ago

stripathy commented 9 years ago

in the following term: 'thick tufted pyramidal cell', 'thick tufted' is identified, but for the query 'Thick Tufted Pyramidal cell', 'thick tufted' is NOT identified.

Seems to be a somewhat general effect.

s = Sherlok()
annotations = list(s.annotate('neuroner', 'Thick Tufted pyramidal cell'))
for a in annotations:
    print a

(0, 12, 'thick tufted', u'Morphology', {u'ontologyId': u'HBP_MORPHOLOGY:0000014'}) (6, 12, 'tufted', u'Morphology', {u'ontologyId': u'HBP_MORPHOLOGY:0000031'}) (13, 22, 'pyramidal', u'Morphology', {u'ontologyId': u'HBP_MORPHOLOGY:0000001'}) (0, 27, 'thick tufted pyramidal cell', u'Neuron', {}) (6, 27, 'tufted pyramidal cell', u'Neuron', {}) (13, 27, 'pyramidal cell', u'Neuron', {}) (23, 27, 'cell', u'NeuronTrigger', {}) (0, 22, 'thick tufted pyramidal', u'PreNeuron', {}) (6, 22, 'tufted pyramidal', u'PreNeuron', {}) (13, 22, 'pyramidal', u'PreNeuron', {})

s = Sherlok()
annotations = list(s.annotate('neuroner', 'thick tufted pyramidal cell'))
for a in annotations:
    print a

(6, 12, 'Tufted', u'Morphology', {u'ontologyId': u'HBP_MORPHOLOGY:0000031'})
(13, 22, 'pyramidal', u'Morphology', {u'ontologyId': u'HBP_MORPHOLOGY:0000001'})
(6, 27, 'Tufted pyramidal cell', u'Neuron', {})
(13, 27, 'pyramidal cell', u'Neuron', {})
(23, 27, 'cell', u'NeuronTrigger', {})
(6, 22, 'Tufted pyramidal', u'PreNeuron', {})
(13, 22, 'pyramidal', u'PreNeuron', {})