Open jeisner opened 11 years ago
Some mentioned features include (rarity of) trigram of POS and (rarity of) subtree of dependency tree of those three words. Currently thinking about the impact on classification if the feature file looking like this:
1 DET_N_V ...
where 1
represents no mistake has happened, as opposed to:
1 COMMON_TRIGRAM ...
or
1 DET_N_V 0.2 ...
where 0.2
would be the feature value that represents the probability of DET_N_V occurring out of all possible combinations. Not sure that the third one represents a valid feature value ...
(Check out File Formats section to review training file format)
The NLP homework on log-lin models as well as looking at the example feature value given on Hal's website should help, though the answers are unfortunately not immediately obvious. (Just want to document the process by commenting here)
For reasons we discussed in today's meeting, I'd just use binary features like DET_N_V (and backoff features like DET_N and N_V and so on).
On Tue, Mar 26, 2013 at 6:25 PM, karuiwu notifications@github.com wrote:
Some mentioned features include (rarity of) trigram of POS and (rarity of) subtree of dependency tree of those three words. Currently thinking about the impact on classification if the feature file looking like this:
1 DET_N_V ...
where 1 represents no mistake has happened, as opposed to:
1 COMMON_TRIGRAM ...
or
1 DET_N_V 0.2
where 0.2 would be the feature value that represents the probability of DET_N_V occurring out of all possible combinations. Not sure that the third one represents a valid feature value ...
The NLP homework on log-lin models as well as looking at the example feature value given on Hal's website should help, though the answers are unfortunately not immediately obvious. (Just want to document the process by commenting here)
— Reply to this email directly or view it on GitHubhttps://github.com/karuiwu/reparse/issues/3#issuecomment-15492813 .
true
for !parserState.isTerminalState()
when there is more than one element in the stack/the dependency tree is incompletepublic boolean isTerminalState() { return input.isEmpty(); }
Week 1: Left-Right error detection works Week 2: Revision operators are designed
Building a classifier will give us more control over the revision process. An important question is what sort of classifier should we make? Here is the current plan:
Example: Sentence: "The old man the boat" Description: At the point in which ["man"] and ["the", "boat"] are on the stack and buffer respectively after a certain number of actions where "man" is labeled a Noun, the first, binary classifier will say "something's gone wrong". Then, the other classifier(s) determine(s) where exactly the error was made and how we can fix it.
This contrasts with our current classifier, which was trained to predict whether or not the next move made by the parser would be incorrect.
Based on the training set for #1, build a classifier (e.g., using MegaM, #2). Do error analysis on the dev set -- when does the classifier have false positives and false negatives? Study those examples and try to devise new features that improve the classifier. Iterate.