There was indeed a bug in reading in the Genia training corpus. I accidentally used the sample corpus as training corpus. The correct training corpus contains over 20k sentences. Wolfe seems to be quite memory hungry at the moment — I needed to provide 12G RAM. Results after 20 epochs of average perceptron learning (~1h training and testing) look quite good, although there is still much space for improvement. The performance is head-to-head with the winner of the task 2004 (72.5 F₁):
http://acl.ldc.upenn.edu/coling2004/W1/pdf/19.pdf
Train:
Total Gold: 109588
Total Guess: 120081
Precision: 0.852466
Recall: 0.934089
F1: 0.891413
Test:
Total Gold: 19392
Total Guess: 22570
Precision: 0.673859
Recall: 0.784292
F1: 0.724894
There was indeed a bug in reading in the Genia training corpus. I accidentally used the sample corpus as training corpus. The correct training corpus contains over 20k sentences. Wolfe seems to be quite memory hungry at the moment — I needed to provide 12G RAM. Results after 20 epochs of average perceptron learning (~1h training and testing) look quite good, although there is still much space for improvement. The performance is head-to-head with the winner of the task 2004 (72.5 F₁): http://acl.ldc.upenn.edu/coling2004/W1/pdf/19.pdf