stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.69k stars 2.7k forks source link

Named entity recognition doesn't identify entities that are used as hashtags #481

Open JamesFrost opened 7 years ago

JamesFrost commented 7 years ago

The named entity annotator fails to identify entities that are used as a hashtag.

For example I like Jeremy would correctly identify Jeremy as a named entity. However, I like #Jeremy would not.

Is this behaviour intended?

J38 commented 7 years ago

The training data for the system was not trained on social media data, so it never saw an example where a # started an entity. I think you raise a good point that we need to think about improved handling of social media text.

J38 commented 7 years ago

We could possibly add something that processes tokens that start with # and removes the # for the purpose of named entity recognition tagging.