sergey-tihon / Stanford.NLP.NET

Stanford NLP for .NET
http://sergey-tihon.github.io/Stanford.NLP.NET/
MIT License
595 stars 123 forks source link

Need help with a working example of Coreferencing c# #77

Closed Millymanz closed 6 years ago

Millymanz commented 6 years ago

I have been trying for a while to implement a method that can perform coreferencing using the stanford.nlp.net. Been trying to test sentences such as "Barack Obama was born in Hawaii. He is the president. Obama was elected in 2008" Or "Which Apple supplier’s share price goes up the most when the company releases a new product?"

var documentText = "Barack Obama was born in Hawaii.  He is the president. Obama was elected in 2008";
var props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, coref");
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,mention");
props.setProperty("annotators", " tokenize,ssplit,pos,lemma,ner,depparse");
props.setProperty("ner.useSUTime", "0");

var curDir = Environment.CurrentDirectory;
System.IO.Directory.SetCurrentDirectory(jarRoot);
pipeline = new StanfordCoreNLP(props);
System.IO.Directory.SetCurrentDirectory(curDir);

corefSystem = new SieveCoreferenceSystem(props);

Annotation document = new Annotation(documentText);
pipeline.annotate(document);

Dictionary<int, CorefChain> coref = document.get(new CorefCoreAnnotations.CorefChainAnnotation().getClass()) as Dictionary<int, CorefChain>;

P.S Note I keep getting a null value for coref. I have other code which takes care of the pipeline etc the whole setting up of the libraries that I am not showing in this example.

Can you run through a working example of how to fully implement coreferencing?

Thanks

sergey-tihon commented 6 years ago

I have the test in F# that check dependencies

Annotation pipeline:

let props = Properties()
props.setProperty("annotators","tokenize, ssplit, pos, lemma, ner, parse, dcoref") |> ignore
props.setProperty("sutime.binders","0") |> ignore
props.setProperty("ner.useSUTime","0") |> ignore

then you extract sentences from Annotation:

let sentences = annotation.get(CoreAnnotations.SentencesAnnotation().getClass()) :?> java.util.ArrayList

and then dependencies between words - open

        printfn "\nDependencies:"
        let deps = sentence.get(SemanticGraphCoreAnnotations.CollapsedDependenciesAnnotation().getClass()) :?> SemanticGraph
        Expect.isNotNull deps "Semantic graph is null"
        for edge in deps.edgeListSorted().toArray() |> Seq.cast<SemanticGraphEdge> do
            let gov = edge.getGovernor()
            Expect.isNotNull gov "Governor is null"
            let dep = edge.getDependent()
            Expect.isNotNull dep "Dependent is null"
            printfn "%O(%s-%d,%s-%d)"
                (edge.getRelation())
                (gov.word()) (gov.index())
                (dep.word()) (dep.index())

Is it what you are looking for?

Millymanz commented 6 years ago

Hi, this is not what I am looking for. I want to know how to use

Dictionary<int, CorefChain> coref = document.get(new CorefCoreAnnotations.CorefChainAnnotation().getClass()) as Dictionary<int, CorefChain>;

I am trying to do the following example which is in JAVA, but I want to do it in C# or F# - Coreferencing

Map<Integer, CorefChain> coref = document.get(CorefChainAnnotation.class);

for(Map.Entry<Integer, CorefChain> entry : coref.entrySet()) {
    CorefChain c = entry.getValue();

    //this is because it prints out a lot of self references which aren't that useful
    if(c.getCorefMentions().size() <= 1)
        continue;

    CorefMention cm = c.getRepresentativeMention();
    String clust = "";
    List<CoreLabel> tks = document.get(SentencesAnnotation.class).get(cm.sentNum-1).get(TokensAnnotation.class);
    for(int i = cm.startIndex-1; i < cm.endIndex-1; i++)
        clust += tks.get(i).get(TextAnnotation.class) + " ";
    clust = clust.trim();
    System.out.println("representative mention: \"" + clust + "\" is mentioned by:");

    for(CorefMention m : c.getCorefMentions()){
        String clust2 = "";
        tks = document.get(SentencesAnnotation.class).get(m.sentNum-1).get(TokensAnnotation.class);
        for(int i = m.startIndex-1; i < m.endIndex-1; i++)
            clust2 += tks.get(i).get(TextAnnotation.class) + " ";
        clust2 = clust2.trim();
        //don't need the self mention
        if(clust.equals(clust2))
            continue;

        System.out.println("\t" + clust2);
    }
}
sergey-tihon commented 6 years ago

@Millymanz do you have full java sample? (with annotation pipeline definition)

Millymanz commented 6 years ago

unfortunately I dont have the pipeline definition.

This is the best I got for the annotation:

        Properties props = new Properties();
        props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
        props.put("dcoref.score", true);
        pipeline = new StanfordCoreNLP(props);
        Annotation document = new Annotation("The atom is a basic unit of matter, it   consists of a dense central nucleus surrounded by a cloud of negatively charged electrons.");

        pipeline.annotate(document);
sergey-tihon commented 6 years ago

I've added sample from this https://stanfordnlp.github.io/CoreNLP/coref.html - here it is https://github.com/sergey-tihon/Stanford.NLP.NET/commit/9881323b0fb8ce2198afc9907fbe414d75a8cfc7

Seems that pipeline should be tokenize,ssplit,pos,lemma,ner,parse,mention,coref

And for the sentence

Barack Obama was born in Hawaii. He is the president. Obama was elected in 2008.

you should see

2017-10-18_1208

the result is similar to corenlp.run

2017-10-18_1215

the full source code

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

using java.util;
using java.io;
using edu.stanford.nlp.coref;
using edu.stanford.nlp.ling;
using edu.stanford.nlp.pipeline;
using edu.stanford.nlp.util;
using Console = System.Console;
using edu.stanford.nlp.coref.data;
using java.lang;
using System.IO;

namespace standfordnlp
{
    class CorefAnnotator
    {
        // Sample from https://stanfordnlp.github.io/CoreNLP/coref.html
        static void Main()
        {
            var jarRoot = @"..\..\..\..\data\paket-files\nlp.stanford.edu\stanford-corenlp-full-2017-06-09\models";

            Annotation document = new Annotation("Barack Obama was born in Hawaii.  He is the president. Obama was elected in 2008.");
            Properties props = new Properties();
            props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,mention,coref");
            props.setProperty("ner.useSUTime", "0");

            var curDir = Environment.CurrentDirectory;
            Directory.SetCurrentDirectory(jarRoot);
            var pipeline = new StanfordCoreNLP(props);
            Directory.SetCurrentDirectory(curDir);

            pipeline.annotate(document);

            var corefChainAnnotation = new CorefCoreAnnotations.CorefChainAnnotation().getClass();
            var sentencesAnnotation = new CoreAnnotations.SentencesAnnotation().getClass();
            var corefMentionsAnnotation = new CorefCoreAnnotations.CorefMentionsAnnotation().getClass();

            Console.WriteLine("---");
            Console.WriteLine("coref chains");
            var corefChain = document.get(corefChainAnnotation) as Map;
            foreach (CorefChain cc in corefChain.values().toArray()) {
                Console.WriteLine($"\t{cc}");
            }
            var sentences = document.get(sentencesAnnotation) as ArrayList;
            foreach (CoreMap sentence in sentences.toArray()) {
                Console.WriteLine("---");
                Console.WriteLine("mentions");
                var corefMentions = sentence.get(corefMentionsAnnotation) as ArrayList;
                foreach (Mention m in corefMentions) {
                    Console.WriteLine("\t" + m);
                }
            }
        }
    }
}
Millymanz commented 6 years ago

Thank you so much for the example. You are a star.