Open JamesFrost opened 7 years ago
The training data for the system was not trained on social media data, so it never saw an example where a # started an entity. I think you raise a good point that we need to think about improved handling of social media text.
We could possibly add something that processes tokens that start with # and removes the # for the purpose of named entity recognition tagging.
The named entity annotator fails to identify entities that are used as a hashtag.
For example
I like Jeremy
would correctly identify Jeremy as a named entity. However,I like #Jeremy
would not.Is this behaviour intended?