jeisner commented 11 years ago

Based on the training set for #1, build a classifier (e.g., using MegaM, #2). Do error analysis on the dev set -- when does the classifier have false positives and false negatives? Study those examples and try to devise new features that improve the classifier. Iterate.

karuiwu commented 11 years ago

Some mentioned features include (rarity of) trigram of POS and (rarity of) subtree of dependency tree of those three words. Currently thinking about the impact on classification if the feature file looking like this:

1 DET_N_V ...

where 1 represents no mistake has happened, as opposed to:

1 COMMON_TRIGRAM ...

or

1 DET_N_V 0.2 ...

where 0.2 would be the feature value that represents the probability of DET_N_V occurring out of all possible combinations. Not sure that the third one represents a valid feature value ...

(Check out File Formats section to review training file format)

The NLP homework on log-lin models as well as looking at the example feature value given on Hal's website should help, though the answers are unfortunately not immediately obvious. (Just want to document the process by commenting here)

jeisner commented 11 years ago

For reasons we discussed in today's meeting, I'd just use binary features like DET_N_V (and backoff features like DET_N and N_V and so on).

On Tue, Mar 26, 2013 at 6:25 PM, karuiwu notifications@github.com wrote:

Some mentioned features include (rarity of) trigram of POS and (rarity of) subtree of dependency tree of those three words. Currently thinking about the impact on classification if the feature file looking like this:

1 DET_N_V ...

where 1 represents no mistake has happened, as opposed to:

1 COMMON_TRIGRAM ...

or

1 DET_N_V 0.2

where 0.2 would be the feature value that represents the probability of DET_N_V occurring out of all possible combinations. Not sure that the third one represents a valid feature value ...

The NLP homework on log-lin models as well as looking at the example feature value given on Hal's website should help, though the answers are unfortunately not immediately obvious. (Just want to document the process by commenting here)

— Reply to this email directly or view it on GitHubhttps://github.com/karuiwu/reparse/issues/3#issuecomment-15492813 .

karuiwu commented 11 years ago

The Agenda

Create training/dev sets for the classifier and MaltParser

Create small tool to split CoNLL data with gold parses into train and dev (jack-knifed, so there will be multiple train and dev set). For example, with 10,000 gold parses, we can create a training set of 9,000 sentences and a dev set of 1,000 sentences. We already have an english test set with its corresponding gold parses, so we don't need to create a separate test set.
- Tool is called CoNLLSplitter.py
Create a very small tool to split MegaM data into training and classification sets. (Only need to slightly tweak CoNLLSplitter as the training data is in a different format. May merge tools in the future, as they share much overlapping code) - Code not as transferrable because training data for classifier cannot be stored in sets (duplicates may occur) whereas (from what I understand), training data for parser may be, and sets were originally used to calculate the difference between list. Solving by using set operations on the indices of the elements as opposed to the actual elements.
- Tool is called MegaMSplitter.py

Update Classifier Features

~~Investigate why MaltParser returns true for !parserState.isTerminalState() when there is more than one element in the stack/the dependency tree is incomplete~~
- Thus far: NivreConfig indicates this line of code, which means that there is further processing of the stack elsewhere ... public boolean isTerminalState() { return input.isEmpty(); }
- Answer: It means that all elements on the stack are linked to the ROOT as children, since elements on the stack are unable to link with another; they are only able to link with an element on the buffer. If this is the case, then we will disregard all parser actions once the buffer is empty. We may consider the length of the stack once the buffer is empty to be a feature indicating an error is present in the future. How many children does the ROOT usually have? I've only seen instances in which the ROOT has only one child, which is the main verb.
Iterate on format of the arc-features
- Explanation: The two main formats being discussed at the moment include a list of arcs such as, (s1, l1, b1), (s2, l2, b2), ..., (sn, ln, bn); which include all incoming and outgoing arcs of the first N-1 words on the stack and the first word on the buffer, where N indicates the length of the arc-feature N-gram; or to list the incoming and outgoing arcs separately for each corresponding word on the stack and buffer, and it would look something like: IN-1-3|OUT_2, where the numbers indicate the relative distance of the incoming arc. I much prefer the first option, as it is much more concise (removes the duplicate arcs mentioned), and indirectly implies incoming and outgoing arcs. However, it is true indeed that there are many possible formats for the arc features, and we may consider having multiple types to imply different information as well as different amounts of information.
Features to implement: depth of the stack, particular words

Analyze Classifier Output

Analyze the accuracy of the classifier with just the POS trigram and arc features
- Result with just POS trigrams:
  Error rate for mislabeled correct parser actions: 20 / 6467
  Error rate for mislabeled incorrect parser actions: 479 / 588
~~Plot the precision, recall, and f-measure of the classifier's output; adjust lambda for MegaM~~
Brainstorm revision-based actions that can be accomplished with this classifier of POS trigram and arc features

karuiwu commented 11 years ago

Timeline

Left-Right error detection
- binary classification of features to type/location
- implement corresponding features
- Notes: look at psycholinguistics literature on backtracking (John Hale)
- vs. classifier with features
design and implement revision operators
- learn and evaluate on different languages and beam widths
- check-out Vowpal Wabbit Searn and DAgger

Week 1: Left-Right error detection works Week 2: Revision operators are designed

Description of classifier(s)

Building a classifier will give us more control over the revision process. An important question is what sort of classifier should we make? Here is the current plan:

Two-way/Three-way classifier, where one is binary and detects whether or not an error has been made yet, and the other classifier(s) determine(s) the location and type of error.

Example: Sentence: "The old man the boat" Description: At the point in which ["man"] and ["the", "boat"] are on the stack and buffer respectively after a certain number of actions where "man" is labeled a Noun, the first, binary classifier will say "something's gone wrong". Then, the other classifier(s) determine(s) where exactly the error was made and how we can fix it.

This contrasts with our current classifier, which was trained to predict whether or not the next move made by the parser would be incorrect.

May 25, 2013

Reimplemented dynamic oracle in C++ and integrated with ZPar without any remaining bugs

May 26, 2013

Reimplemented POS and arc-features in C++ and integrated with ZPar
Worked on feature that looks at possible tags of children of a specified word (i.e. "drinks"). This will help us build a confidence level of the consistency/accuracy of arcs being made
Designed classifier(s) to be implemented

May 27, 2013

Built a map of all possible child tags of a specific word
- Example: ("eat", {N, NONE, ADV})
"Pickled" map objects to "map.txt"; will be used for later features

May 28, 2013

Installed Vowpal Wabbit (Mac OSX instructions: http://qugstart.com/blog/machine-learning/how-i-finally-got-vowpal-wabbit-7-0-installed-on-osx-10-6-snow-leopard/)

karuiwu / reparse

build classifier for parser errors #3

The Agenda

Create training/dev sets for the classifier and MaltParser

Update Classifier Features

Analyze Classifier Output

Timeline

Description of classifier(s)

May 25, 2013

May 26, 2013

May 27, 2013

May 28, 2013