karuiwu / reparse

1 stars 0 forks source link

Questions to ask Yue Zhang (ZPar creator) #9

Open karuiwu opened 11 years ago

karuiwu commented 11 years ago

This isn't really an issue, more of a compilation of questions we have about ZPar. We'll send these questions if we don't make any progress on them soon.

  1. Is there dynamic POS tagging during test time? I see here that the generic version of ZPar compiles a tag-set during training time, and here that a separate POS tagger exists. I'm guessing that the parser only accepts POS-tagged data since no output was given for parsing nor indication of error, and once the POS tags are given, they aren't changed during test time. If that's the case, it might be in our interest to design a revision operator that re-tags words.
jeisner commented 11 years ago

On Sun, May 26, 2013 at 8:36 PM, karuiwu notifications@github.com wrote:

Building a classifier will give us more control over the revision process. An important question is what sort of classifier should we make? Here is the current plan:

  • Two-way/Three-way classifier, where one is binary and detects whether or not an error has been made yet, and the other classifier(s) determine(s) the location and type of error.

Example: Sentence: "The old man the boat" Description: At the point in which ["man"] and ["the", "boat"] are on the stack and buffer respectively after a certain number of actions where "man" is labeled a Noun, the first, binary classifier will say "something's gone wrong". Then, the other classifier(s) determine(s) where exactly the error was made and how we can fix it.

That's fine. It may not matter much for accuracy. As we've said in past meetings, the main reason to pick this architecture is speed. If the first classifier thinks there hasn't been a recent error, then the second classifier doesn't have to be run.

More precisely: The output of the first classifier determines whether the second classifer chooses the next action from among the usual arc-eager actions {shift, reduce, arc-left, arc-right}, or from a larger set that also includes various revision operators.

This contrasts with our current classifier, which was trained to predict whether or not the next move made by the parser would be incorrect.

The next move? Wasn't it trained to predict whether any of the previous 3 moves were incorrect? (Based on the current state after those moves were taken.)