microth / PathLSTM

Neural SRL model
71 stars 14 forks source link

Verb Sense Accuracy - Set-up Issues? #24

Closed chrisoutwright closed 3 years ago

chrisoutwright commented 3 years ago

In a short test, I tried the sentences below to ascertain the accuracy of PathLSTM PropBank/NomBank for verb/nun sense. As for find out and go, the results seem puzzling and I wonder if there is something wrong with my setup, as these verbs should be find.03 and go.02 respectively:

The the the DT  DT  _   _   2   2   NMOD    NMOD    _   _   _
2   waitress    waitress    waitress    NN  NN  _   _   3   3   SBJ SBJ _   _   A0
3   found   found   found   VBD VBD _   _   0   0   ROOT    ROOT    Y   find.01 _
4   out out out RP  RP  _   _   3   3   PRT PRT _   _   _
5   that    that    that    IN  IN  _   _   3   3   OBJ OBJ _   _   A1
6   she she she PRP PRP _   _   7   7   SBJ SBJ _   _   _
7   was be  be  VBD VBD _   _   5   5   SUB SUB _   _   _
8   fat fat fat JJ  JJ  _   _   7   7   PRD PRD _   _   _
9   .   .   .   .   .   _   _   3   3   P   P   _   _   _

1   We  we  we  PRP PRP _   _   2   2   SBJ SBJ _   _   A1
2   are be  be  VBP VBP _   _   0   0   ROOT    ROOT    _   _   _
3   going   go  go  VBG VBG _   _   2   2   VC  VC  Y   go.01   _
4   on  on  on  IN  IN  _   _   3   3   ADV ADV _   _   _
5   vacation    vacation    vacation    NN  NN  _   _   4   4   PMOD    PMOD    _   _   _
6   to  to  to  TO  TO  _   _   3   3   DIR DIR _   _   A4
7   Singapore   singapore   singapore   NNP NNP _   _   6   6   PMOD    PMOD    _   _   _
8   .   .   .   .   .   _   _   2   2   P   P   _   _   _

I used the following models:

srl-ACL2016-eng
CoNLL2009-ST-English-ALL.anna-3.3.lemmatizer
CoNLL2009-ST-English-ALL.anna-3.3.parser
CoNLL2009-ST-English-ALL.anna-3.3.postagger
stanford-corenlp-3.7.0
microth commented 3 years ago

Hi there,

The predicate disambiguation model in PathLSTM is unchanged from previous work (mate-tools). It's a simple logistic regression classifier that uses (binary) lexical and syntactic indicators as features. As the model is trained on CoNLL-2009 data, it probably won't work well on other text types (there probably are not many cases of waitresses and going on vacation in news). Hope this helps.

Best, MIchael

chrisoutwright commented 3 years ago

Thank you Michael, this is informative indeed. I am assessing the FrameNet model (idea is to transform sentences to Frame-based triples for a knowledge graph with additional properties (NE etc.). Is there a way to differentiate Frames from their Frame Elements? The column number doesn't always match (FE to Frames), so one would need to do a checkup on the most fitting frame, depending on seen frame elements? So far I have done that with python, transforming CoNLL09 manually. Is there a way to generate these information via the given classes in the code? I have written a simply Server in Java to access the model predictions (my notebook is too low on memory to run it) for the project I do in python. It would be great if one could get the frame vs elements mapping directly.

Server with grizzly:

package ch.digital_sparrow.PathLSTM_service;

import java.io.IOException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Locale;

import org.glassfish.grizzly.http.server.HttpHandler;
import org.glassfish.grizzly.http.server.HttpServer;
import org.glassfish.grizzly.http.server.Request;
import org.glassfish.grizzly.http.server.Response;

import ch.digital_sparrow.PathLSTM_service.Pipeline;
import se.lth.cs.srl.CompletePipeline;

public class Service {

    public static void main(String[] args) {

        final Pipeline my_srl_pipe = new Pipeline();

        HttpServer server = HttpServer.createSimpleServer();
        server.getServerConfiguration().addHttpHandler(
            new HttpHandler() {
                public void service(Request request, Response response) throws Exception {
                    final SimpleDateFormat format = new SimpleDateFormat("EEE, dd MMM yyyy HH:mm:ss zzz", Locale.US);
                    final String date = format.format(new Date(System.currentTimeMillis()));
                    response.setContentType("text/plain");
                    response.setContentLength(date.length());
                    response.getWriter().write(date);

                }
            },
            "/time");

        server.getServerConfiguration().addHttpHandler(
                new HttpHandler() {
                    public void service(Request request, Response response) throws Exception {
                    String parsed = "";
                    Request request1 = request;
                    String input = request.getPostBody(1024).toStringContent();
                    response.setContentType("text/plain");

                    my_srl_pipe.parseFromInputString(input, true);
                    parsed = my_srl_pipe.getCoNLLString();
                    response.setContentLength(parsed.length());
                    response.getWriter().write(parsed);

                    }
                },
                "/parse");

        try {
            server.start();
            System.out.println("Press any key to stop the server...");
            System.in.read();
        } catch (Exception e) {
            System.err.println(e);
        }

    }

}

The modified Pipeline Class for accepting Strings:


package ch.digital_sparrow.PathLSTM_service;

import java.io.IOException;
import java.io.InputStreamReader;
import java.nio.charset.Charset;
import java.util.ArrayList;
import java.util.Scanner;
import java.util.zip.ZipException;
import java.io.BufferedReader;
import se.lth.cs.srl.CompletePipeline;
import se.lth.cs.srl.options.CompletePipelineCMDLineOptions;
import se.lth.cs.srl.util.FileExistenceVerifier;
import se.lth.cs.srl.corpus.Sentence;
import java.io.IOException;
import java.io.File;
import java.io.FileInputStream;

public class Pipeline {

    class CoNLLOutput {  
         int senCount=0;
         ArrayList<String> sentences = new ArrayList<String>();

         public void write(Sentence s) {

            //sentences.add(s.toString() + "\n\n");
            sentences.add(s.toString() + "\n\n");
            senCount++;

         }

    }

    public CompletePipeline cpl;
    private CompletePipelineCMDLineOptions options;
    public CoNLLOutput out;

    public Pipeline() {

        String textfile_path = "";

        //String test_sentence_arg = System.getProperty("user.dir") + textfile_path;
        String test_sentence_arg = textfile_path;
        String path_prefix = "E:/eclipse-workspace/PathLSTM-pre-illinois-built/models/";
        String lemma_arg = path_prefix + "CoNLL2009-ST-English-ALL.anna-3.3.lemmatizer.model";
        String pos_arg = path_prefix + "CoNLL2009-ST-English-ALL.anna-3.3.postagger.model";
        String parser_arg =  path_prefix +  "CoNLL2009-ST-English-ALL.anna-3.3.parser.model";
        String srl_arg = path_prefix + "srl-ACL2016-eng.model";

        String standford = path_prefix + "stanford-corenlp-3.8.0.jar";
        String srl_frame = path_prefix +"srl-ICCG16-stanford-eng.model";

        String frame_net = path_prefix;

        String[] myargs = {"eng", "-tokenize", "-lemma",lemma_arg, "-reranker", "-externalNNs", "-tagger", pos_arg, "-parser", parser_arg, "-srl", srl_arg, "-test", test_sentence_arg};

        String[] myargs_frame = {"fnet","-test", test_sentence_arg,"-srl", srl_frame,"-reranker", "-externalNNs", "-globalFeats", "-tokenize", "-framenet", frame_net,"-stanford", "-out", "out.conll"};

        options = new CompletePipelineCMDLineOptions();

        options.parseCmdLineArgs(myargs_frame);
        String error = FileExistenceVerifier.verifyCompletePipelineAllNecessaryModelFiles(options);
        if (error != null) {
            System.err.println(error);
            System.err.println();
            System.err.println("Aborting.");
            System.exit(1);
        }

        CompletePipeline pipeline;
        try {
            pipeline = CompletePipeline.getCompletePipeline(options);
            System.out.println("Pipeline successfully created!");
            cpl = pipeline;
            out = new CoNLLOutput();
        } catch (ZipException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (ClassNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

}

    public void parseFromInputTexfile (String textfile_path, Boolean reset) {

        if (reset) { 
            out = new CoNLLOutput();
            }
        String path_chosen = System.getProperty("user.dir") +"/"+ textfile_path;
        System.out.println("Load from File: " + path_chosen);
        options.input = new File(path_chosen);
        try{
            getCoNLLStringfromFile(cpl);
        } catch(Exception e) {

        e.printStackTrace();
        System.out.println("Failed to parse File for SRL");
        System.exit(1);
        }

    }

    public void parseFromInputString (String text, Boolean reset) {

        if (reset) { 
            out = new CoNLLOutput();
            }

        try{
            getCoNLLStringfromString(cpl, text);
        } catch(Exception e) {

        e.printStackTrace();
        System.out.println("Failed to parse File for SRL");
        System.exit(1);
        }

    }

    public void printCoNLLString() {

        for(String s: out.sentences){

            System.out.println(s);
        }
    }

    public String getCoNLLString() {

        String ret_string = "";

        for(String s: out.sentences){

            if(ret_string.length()>0) {
            ret_string += s;
            } else {
                ret_string = s;
            }

        }

        return ret_string;

    }

    public ArrayList<String> getCoNLLStringArray() {

        return out.sentences;

    }

    private int getCoNLLStringfromFile(CompletePipeline pipeline) throws IOException,
            Exception {

        BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(options.input), Charset.forName("UTF-8")));
        String str;

        while ((str = in.readLine()) != null) {
            Sentence s = pipeline.parse(str);
            //System.out.println(s.toString() + "\n\n");
            out.write(s);

            if (out.senCount % 5 == 0)
                System.out.println("Processing sentence " + out.senCount); // TODO,

        }

        return out.senCount;
    }

    private int getCoNLLStringfromString(CompletePipeline pipeline, String sentences) throws IOException,
    Exception {

        Scanner scanner = new Scanner(sentences);

        while (scanner.hasNextLine()) {

            Sentence s = pipeline.parse(scanner.nextLine());
            //System.out.println(s.toString() + "\n\n");
            out.write(s);

            if (out.senCount % 5 == 0)
                System.out.println("Processing sentence " + out.senCount); // TODO,

        }
        scanner.close();

        return out.senCount;
}

}
microth commented 3 years ago

Hi there,

(...) Is there a way to differentiate Frames from their Frame Elements? The column number doesn't always match (FE to Frames)

I am not sure I understand the problem. If a word evokes a frame, the corresponding frame is listed in a specific column (PRED). For each evoked frame (n), there will be one additional column (APREDn) that indicates (the head words of) the corresponding FEs. The additional columns will be in the sequential order of the frames as evoked in the sentence. For details, see the CoNLL data description here: https://ufal.mff.cuni.cz/conll2009-st/task-description.html

(...) so one would need to do a checkup on the most fitting frame, depending on seen frame elements? (...)

No, this is not necessary. The model only predicts frame elements that match an evoked frame. If you notice anything else, you might be using the wrong language indicator ("fnet" should be used instead of "eng") or you forgot to specify the path to your copy of the framenet data (-fnet ). For details, see the example script here: https://github.com/microth/PathLSTM/blob/pre-illinois-built/scripts/parse_fnet.sh

Best, Michael