stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.63k stars 2.7k forks source link

StanfordNLP: Unable to identify Date with 7-class-ner #1071

Open tarunshah opened 4 years ago

tarunshah commented 4 years ago

I'm using stanfordNLP to get date entities from text. Here's the code that i tried:-

import java.io.IOException;
import java.util.List;
import edu.stanford.nlp.ie.AbstractSequenceClassifier;
import edu.stanford.nlp.ie.crf.CRFClassifier;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.CoreLabel;

public class StanfordNLP_POC
{

    public static void main(String[] args) throws IOException
    {
        // TODO Auto-generated method stub
        String classifierPath = "src//main//resources//classifiers//english.muc.7class.distsim.crf.ser.gz";

        String inputString = "Appointment Facility: ABC Medicine Clinic 05/07/2020 Progress Notes: Niel Armstrong, DO Current Medications Reason for Appointment";

        AbstractSequenceClassifier classifier = CRFClassifier.getClassifierNoExceptions(classifierPath);

        List<List<CoreLabel>> out = classifier.classify(inputString);

        System.out.println(out.toString());

        for (List<CoreLabel> sentence : out)
        {
            for (CoreLabel word : sentence)
            {

                if (word.getString(CoreAnnotations.AnswerAnnotation.class).equals("O"))
                    continue;
                System.out.println(word.word() + " = " + word.get(CoreAnnotations.AnswerAnnotation.class));
            }
        }

    }

}

I didn't get why it's not extracting Date even though it's very clearly identifiable in the text.

Also when trying with pipeline it extracts date but takes a bit longer to do so.

AngledLuffa commented 4 years ago

The statistical model doesn't have any experience recognizing that particular date format.

If you run CoreNLP with only the models you can see that it doesn't recognize it there, either:

java edu.stanford.nlp.pipeline.StanfordCoreNLP -ner.statisticalOnly

CoreNLP uses some hard coded expressions which recognize dates in that format.