stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.64k stars 2.7k forks source link

Corenlp recognizes ???/ *** as number #767

Open anoopsingh opened 6 years ago

anoopsingh commented 6 years ago

If any sentence has ????? or **** core\nlp tokenizes it and identifies it as Number, which should not happen.

manning commented 5 years ago

Believe it or not, I have been working on trying to fix this in my spare time. To fix it generally – not just for these 2 cases – seems harder than one might hope, in part because numbers are very common and symbols are quite rare in the data…. Maybe more later.

anoopsingh commented 5 years ago

Thanks @manning