renaud / neuroNER

named entity recognizer for neuronal cells, based on UIMA Ruta rules
GNU Lesser General Public License v3.0
7 stars 8 forks source link

Encode the various ways of encoding a hyphen sign #38

Open stripathy opened 9 years ago

stripathy commented 9 years ago

I noticed that the following is matched in neuroNER:

GAD1-expressing cell

but not

GAD1–expressing cell

Note the difference in hyphen signs (they're actually different unicode characters).

In NeuroElectro, I replace weird unicode hyphens with a normal '-' sign.

    newStr = re.sub(u'\u2212', '-',inStr)
    newStr = re.sub(u'\u2013', '-', newStr)