karuiwu / reparse

1 stars 0 forks source link

build classifier for parser errors #3

Open jeisner opened 11 years ago

jeisner commented 11 years ago

Based on the training set for #1, build a classifier (e.g., using MegaM, #2). Do error analysis on the dev set -- when does the classifier have false positives and false negatives? Study those examples and try to devise new features that improve the classifier. Iterate.

karuiwu commented 11 years ago

Some mentioned features include (rarity of) trigram of POS and (rarity of) subtree of dependency tree of those three words. Currently thinking about the impact on classification if the feature file looking like this:

1 DET_N_V ...

where 1 represents no mistake has happened, as opposed to:

1 COMMON_TRIGRAM ...

or

1 DET_N_V 0.2 ...

where 0.2 would be the feature value that represents the probability of DET_N_V occurring out of all possible combinations. Not sure that the third one represents a valid feature value ...

(Check out File Formats section to review training file format)

The NLP homework on log-lin models as well as looking at the example feature value given on Hal's website should help, though the answers are unfortunately not immediately obvious. (Just want to document the process by commenting here)

jeisner commented 11 years ago

For reasons we discussed in today's meeting, I'd just use binary features like DET_N_V (and backoff features like DET_N and N_V and so on).

On Tue, Mar 26, 2013 at 6:25 PM, karuiwu notifications@github.com wrote:

Some mentioned features include (rarity of) trigram of POS and (rarity of) subtree of dependency tree of those three words. Currently thinking about the impact on classification if the feature file looking like this:

1 DET_N_V ...

where 1 represents no mistake has happened, as opposed to:

1 COMMON_TRIGRAM ...

or

1 DET_N_V 0.2

where 0.2 would be the feature value that represents the probability of DET_N_V occurring out of all possible combinations. Not sure that the third one represents a valid feature value ...

The NLP homework on log-lin models as well as looking at the example feature value given on Hal's website should help, though the answers are unfortunately not immediately obvious. (Just want to document the process by commenting here)

— Reply to this email directly or view it on GitHubhttps://github.com/karuiwu/reparse/issues/3#issuecomment-15492813 .

karuiwu commented 11 years ago

The Agenda

Create training/dev sets for the classifier and MaltParser

Update Classifier Features

Analyze Classifier Output

karuiwu commented 11 years ago

Timeline

  1. Left-Right error detection
    • binary classification of features to type/location
    • implement corresponding features
    • Notes: look at psycholinguistics literature on backtracking (John Hale)
    • equation vs. classifier with features
  2. design and implement revision operators
    • learn and evaluate on different languages and beam widths
    • check-out Vowpal Wabbit Searn and DAgger

Week 1: Left-Right error detection works Week 2: Revision operators are designed

Description of classifier(s)

Building a classifier will give us more control over the revision process. An important question is what sort of classifier should we make? Here is the current plan:

Example: Sentence: "The old man the boat" Description: At the point in which ["man"] and ["the", "boat"] are on the stack and buffer respectively after a certain number of actions where "man" is labeled a Noun, the first, binary classifier will say "something's gone wrong". Then, the other classifier(s) determine(s) where exactly the error was made and how we can fix it.

This contrasts with our current classifier, which was trained to predict whether or not the next move made by the parser would be incorrect.

May 25, 2013

May 26, 2013

May 27, 2013

May 28, 2013