junekihong commented 11 years ago

We are assuming that Maltparser is using a probabilistic classifier, and so we should check and see if this is the case.

junekihong commented 11 years ago

So I found in Maltparser how it decides to take an action.

Basically, I found the array of possible decisions to take, for each decision that Maltparser makes. And the array is ranked from most probable to least probable. Maltparser takes the first element from this list and then performs the action.

Because of this, I am convinced that Maltparser has a probabilistic classifier. And it might be possible to use this in order to implement beam search.

jeisner commented 11 years ago

Are you sure that the elements of the vector are truly ranked by probability\ and not by score? And can you find the probabilities, not just the ranking? We need the probabilities for the uneasiness, survey, and council features that we've discussed.

That is, what makes you think it is "most probable" rather than "highest scoring"? What is the type of probability model? If you find something that says "log-linear model" or perhaps "decision tree," for example, then I'll believe you. :-)

MaltParser already has beam search implemented, doesn't it?

On Tue, Apr 9, 2013 at 11:07 PM, Juneki Hong notifications@github.comwrote:

So I found in Maltparser how it decides to take an action.

Basically, I found the array of possible decisions to take, for each decision that Maltparser makes. And the array is ranked from most probable to least probable. Maltparser takes the first element from this list and then performs the action.

Because of this, I am convinced that Maltparser has a probabilistic classifier. And it might be possible to use this in order to implement beam search.

— Reply to this email directly or view it on GitHubhttps://github.com/karuiwu/reparse/issues/7#issuecomment-16152951 .

junekihong commented 11 years ago

I do have the associated numbers used to determine the ranking. But they do seem to be scores rather than probabilities. For one thing, a few of their values can be greater than 1. For another thing, the lowest values in the array tend to be negative values.

Tomorrow, I'll take a closer look to see if there are maybe probabilities underlying these scores. And I'll also check among the command line flag options whether beam search is already implemented and offered.

jeisner commented 11 years ago

Please look at the training code to see how the classifier is trained. That may tell you what kind of classifier it is. Alternatively, look at the papers that describe MaltParser -- they should answer that question.

Note that Ariya's code uses a log-linear model for this purpose.

I assume MaltParser would work rather badly if it didn't use beam search. The point of our project is that by adding backtracking (revision), we may be able to get away with a smaller beam.

On Wed, Apr 10, 2013 at 3:13 AM, Juneki Hong notifications@github.comwrote:

I do have the associated numbers used to determine the ranking. But they do seem to be scores rather than probabilities. For one thing, a few of their values can be greater than 1. For another thing, the lowest values in the array tend to be negative values.

Tomorrow, I'll take a closer look to see if there are maybe probabilities underlying these scores. And I'll also check among the command line flag options whether beam search is already implemented and offered.

— Reply to this email directly or view it on GitHubhttps://github.com/karuiwu/reparse/issues/7#issuecomment-16158857 .

junekihong commented 11 years ago

So I've been going through how the classifier is trained. I still haven't gone through it completely, and I think I'll understand it more fully if I spend more time on it.

Anyways, I do know that Maltparser is using a Liblinear model. And it uses feature weights to help determine the scores of possible actions. Would more knowledge about linear classifiers help me determine whether this is all probabilistic?

Maltparser's website documentation links to the liblinear website: http://www.csie.ntu.edu.tw/~cjlin/liblinear/

And the website also indicates that a Libsvm option is possible too.

jeisner commented 11 years ago

Ok, so you are making progress. According to that website, LIBLINEAR supports several types of classifiers, including SVM and logistic regression. Which type does MaltParser currently use?

A linear model just means that the score of an action y in a context x has the score theta . f(x,y) where f(x,y) computes a feature vector and theta is the corresponding weight vector. That is, the score is a linear function of f(x,y).

The question is how theta is trained. There are many training algorithms for linear models.

Logistic regression corresponds to the conditional log-linear models that you learned in NLP. If MaltParser is using that, then we're in great shape. In that case, the vector of scores does indeed correspond to a vector of probabilities -- to get the latter, you merely have to exponentiate and renormalize the scores, in the standard way that is explained in section 2.2 of http://cs.jhu.edu/~jason/465/hw-prob/loglin/pdf/formulas.pdf. The training method interpreted the scores as probabilities in exactly that way, because it trained theta so as to maximize equation (23) from that handout.

So, is MaltParser using logistic regression? Or is there a command-line option to do so? If not, maybe we could add such an option (and retrain).

On Thu, Apr 11, 2013 at 2:01 AM, Juneki Hong notifications@github.comwrote:

So I've been going through how the classifier is trained. I still haven't gone through it completely, and I think I'll understand it more fully if I spend more time on it.

Anyways, I do know that Maltparser is using a Liblinear model. And it uses feature weights to help determine the scores of possible actions. Would knowledge about linear classifiers help me determine whether this is all probabilistic?

Maltparser's website documentation links to the liblinear website: http://www.csie.ntu.edu.tw/~cjlin/liblinear/

And the website also indicates that a Libsvm option is possible too.

— Reply to this email directly or view it on GitHubhttps://github.com/karuiwu/reparse/issues/7#issuecomment-16218146 .

junekihong commented 11 years ago

So. Maltparser uses the liblinear library. Which by default was set to SVM. Maltparser was not using a probabilistic classifier. But the good news is that we were able to change that.

We found this website that specified different classifiers we could make liblinear use: http://www.makelinux.net/man/1/L/liblinear-train

I was able to set maltparsers liblinear model to use L2-regularized logistic regression. Which I understand to be probabilistic.

jeisner commented 11 years ago

Thanks. However, there must be a reason that they preferred the SVM classifier in the standard release. I assume that it was more accurate at test time, or faster to train, or conceivably faster at test time.

So, when you change to logistic regression, how much does that affect performance on a standard train/test set? It would be good to know. We are trying to improve MaltParser, so hopefully we don't have to damage it too much in order to improve it.

On Tue, Apr 16, 2013 at 11:22 PM, Juneki Hong notifications@github.comwrote:

So. Maltparser uses the liblinear library. Which by default was set to SVM. Maltparser was not using a probabilistic classifier. But the good news is that we were able to change that.

We found this website that specified different classifiers we could make liblinear use: http://www.makelinux.net/man/1/L/liblinear-train

I was able to set maltparsers liblinear model to use L2-regularized logistic regression.

— Reply to this email directly or view it on GitHubhttps://github.com/karuiwu/reparse/issues/7#issuecomment-16484870 .

junekihong commented 11 years ago

So I ran some experiments to compare the differences between the two classifiers. I compared accuracy of parse, and runtime.

Accuracy: I found this website that provides a script that evaluates conll data to the gold parse. http://ilk.uvt.nl/conll/software.html The test sentences that I used were 7 Swedish sentences that came with Maltparser as example data.

I found that running the parser using the original configurations does in fact result in higher accuracies than our probabilistic modification. The full result files were pretty long, but here are the summaries:

Original classifiers accuracy: Labeled attachment score: 61 / 78 * 100 = 78.21 % Unlabeled attachment score: 67 / 78 * 100 = 85.90 % Label accuracy score: 63 / 78 * 100 = 80.77 %

Modified classifiers accuracy: Labeled attachment score: 47 / 78 * 100 = 60.26 % Unlabeled attachment score: 54 / 78 * 100 = 69.23 % Label accuracy score: 49 / 78 * 100 = 62.82 %

I can email the full result files if you would like to take a more detailed look.

Runtime: I decided to run Maltparser to parse the same set of test sentences for 10,000 iterations. And retrain its classifier each time. Maltparser outputs the total parsing time at the end, and so I think this will help us get a good idea for which classifier would help Maltparser parse faster.

Here is each total parsing time under the original configuration. Retrain and then 10,000 iterations: Parsing time: 00:01:37 (97217 ms) Parsing time: 00:01:37 (97233 ms) Parsing time: 00:01:35 (95907 ms) Parsing time: 00:01:35 (95198 ms) Parsing time: 00:01:37 (97329 ms)

Here is each total parsing time under our modified configuration. Retrain and then 10,000 iterations: Parsing time: 00:25:36 (1536654 ms) Parsing time: 00:01:33 (93117 ms) Parsing time: 00:01:38 (98378 ms) Parsing time: 00:01:33 (93930 ms) Parsing time: 00:01:26 (86749 ms)

I don't know why it took so long the first time I ran the logistic regression classifier. Maybe the probabilities got messed up and we ended up with a bad classifier. But other than that run, the parsing times seem to take around 1:26 - 1:38. On the other hand, the default SVM classifier seems to be more consistent. Taking around 1:36.

As a summary of these findings, it seems that the default configuration for Maltparser has more consistent run times and overall higher accuracies than logistic regression.

junekihong commented 11 years ago

Maybe we can show that our modifications to Maltparser can achieve better performance and run times than when it is using its default classifier?

jeisner commented 11 years ago

Whoa! Something is fishy here. Your parsing accuracy is dropping by 16-18 points, roughly doubling your error rate. That's horrible: there's no way we can accept such a decrease in accuracy. But I also don't believe it. Something must be going wrong.

(The difference in classification accuracy between SVM and logistic regression should be small, at most 1-2 percentage points. It is possible that certain classification errors might have disproportionate effect on parsing accuracy, but not that disproportionate.)

What parameters are you running with? In particular, how are you setting the cost C (the "-c" option)? You can set it automatically by cross-validation, I think: try leaving out the "-c" option but including "-v 5" to train with 5-fold cross-validation.

It's also possible that you're not calling it correctly. The paper about LIBLINEAR says, among other things: "Several parameters are available for advanced use. For example, one may specify a parameter to obtain probability outputs for logistic regression. Details can be found in the README file." Where is that file? (Note: I am not saying that you want probability outputs, since the SVM wasn't giving probability outputs and the beam search code probably doesn't want them. You just want the linear score before it's converted to a probability, I assume.)

p.s. The time differences in the runs are not significant, except for that one outlier.

junekihong commented 11 years ago

The commands I used were:

Training: java -jar ./dist/maltparser-1.7.2/maltparser-1.7.2.jar -c test -i examples/data/talbanken05_train.conll -m learn Training with logistic regression: java -jar ./dist/maltparser-1.7.2/maltparser-1.7.2.jar -lo "-s 0" -c test -i examples/data/talbanken05_train.conll -m learn Testing: java -jar ./dist/maltparser-1.7.2/maltparser-1.7.2.jar -c test -i examples/data/talbanken05_test.conll -o out.conll -m parse

Note: the -c flag here is for Maltparser's run configuration. Liblinear's run configurations are passed in through the -lo flag. I can set the Liblinear -c to whatever values I want, but I cannot seem to be able to use the Liblinear -v flag with any value. Maltparser says: Unknown learner parameter: '-v' with value '5'.

I went and found the README file from Liblinear's source code. I downloaded it from here: http://www.csie.ntu.edu.tw/~cjlin/cgi-bin/liblinear.cgi?+http://www.csie.ntu.edu.tw/~cjlin/liblinear+tar.gz

From this README file, I found the usage sections. Presumably they would have our advanced parameters:

`train' Usage

Usage: train [options] training_set_file [model_file] options: -s type : set type of solver (default 1) for multi-class classification 0 -- L2-regularized logistic regression (primal) 1 -- L2-regularized L2-loss support vector classification (dual) 2 -- L2-regularized L2-loss support vector classification (primal) 3 -- L2-regularized L1-loss support vector classification (dual) 4 -- support vector classification by Crammer and Singer 5 -- L1-regularized L2-loss support vector classification 6 -- L1-regularized logistic regression 7 -- L2-regularized logistic regression (dual) for regression 11 -- L2-regularized L2-loss support vector regression (primal) 12 -- L2-regularized L2-loss support vector regression (dual) 13 -- L2-regularized L1-loss support vector regression (dual) -c cost : set the parameter C (default 1) -p epsilon : set the epsilon in loss function of epsilon-SVR (default 0.1) -e epsilon : set tolerance of termination criterion ... -B bias : if bias >= 0, instance x becomes [x; bias]; if < 0, no bias term added (default -1) -wi weight: weights adjust the parameter C of different classes (see README for details) -v n: n-fold cross validation mode -q : quiet mode (no outputs) Option -v randomly splits the data into n parts and calculates cross validation accuracy on them.

`predict' Usage

Usage: predict [options] test_file model_file output_file options: -b probability_estimates: whether to output probability estimates, 0 or 1 (default 0); currently for logistic regression only -q : quiet mode (no outputs)

Trying to use the -b flag seems to give a similar error to when I tried the -v flag.

junekihong commented 11 years ago

By any chance could I get a copy of a nice large set of conll data? The 7 swedish sentences example was probably way too small, and I would like to rerun the accuracy experiment on a larger data set.

I tried downloading and using the trial conll data offered by the conll website: http://conll.cemantix.org/2012/data.html It made Maltparser throw a nullPointerException...

jeisner commented 11 years ago

On Thu, Apr 18, 2013 at 4:08 AM, Juneki Hong notifications@github.comwrote:

By any chance could I get a copy of a nice large set of conll data? The 7 swedish sentences example was probably way too small, and I would like to rerun the accuracy experiment on a larger data set.

See issue #5

junekihong commented 11 years ago

Just as an update on my progress before the meeting.

Using a much larger English conll training and testing data set, I reran Maltparser to measure its performance using different classifiers from the Liblinear library. This was the conll data available on the CLSP cluster. Repeating the same experiment as before, I made one run with Maltparser's default SVM, and one using L2-regularized logic regression (with no other settings changed. The 5 fold validation was not set).

Previously, I reported that Maltparser performed much better using SVM rather than the probabilistic logistic regression. Rerunning on a much larger dataset, this seems to still be the case. Here is the results of the updated experiment:

==> original.accuracy <== Labeled attachment score: 15552 / 19381 * 100 = 80.24 % Unlabeled attachment score: 15927 / 19381 * 100 = 82.18 % Label accuracy score: 16961 / 19381 * 100 = 87.51 %

================================================================================

==> modified.accuracy <== Labeled attachment score: 14370 / 19381 * 100 = 74.14 % Unlabeled attachment score: 14865 / 19381 * 100 = 76.70 % Label accuracy score: 16313 / 19381 * 100 = 84.17 %

================================================================================

Jason also suggested trying five-fold cross validation (using the -v flag in Liblinear). I spent some time looking through the internals of how Maltparser handles command-line arguments, and although I didn't find the direct interface at which Maltparser calls to Liblinear, I did find where its default Liblinear values were set. There was nothing set for -v. Even after setting "-v 5" here, unfortunately the results are exactly the same as before. I am wondering if maybe Maltparser does not support the -v flag for some reason? I am still looking for other places where the Liblinear flag values are being altered.

karuiwu commented 11 years ago

No long relevant (using ZPar).

sidharthranjan commented 9 years ago

Can somebody please help me out with Probability scores in Malt Parser? I will be thankful to you.

http://stackoverflow.com/questions/28791352/how-to-get-probability-score-of-parsed-sentences-using-malt-parser

karuiwu / reparse

Check out Maltparsers classifier. #7

`train' Usage

`predict' Usage