vinhkhuc / JFastText

Java interface for fastText
Other
228 stars 100 forks source link

SIGSEGV on getWords after training #8

Open siegebell opened 7 years ago

siegebell commented 7 years ago

Using the example training data (and preprocessing it using the classification-example.sh script that comes with fasttext), I get a SIGSEGV when calling getWords after training.

Training: ft.runCmd("supervised -input dbpedia.train -output model.bin -dim 100 -lr 0.05 -wordNgrams 2 -minCount 5 -bucket 2000000 -epoch 5".split(" "))

model.bin is successfully generated; and if I load it instead of training, there is no crash. I suspect it's running out of memory; but calling unloadModel before getWords does not help. I tried discarding the trained JFastText object and then running loadModel, but it seems model.bin is generated asynchronously so there is no good way to know when to call loadModel.

Crash log: hs_err_pid28676.txt

EDIT: version 0.3 on Mac OSX

vinhkhuc commented 7 years ago

Hi siegebell, I tried your command but couldn't reproduce the issue.

Based on the training command line, the output model file should be "model.bin.bin", not "model.bin" (fastText automatically appends the .bin suffix to the output model file). Could you check if you loaded the correct model file?

siegebell commented 7 years ago

@vinhkhuc the training command I gave above was in error; it should be:

ft.runCmd("supervised -input dbpedia.train -output dbpedia -minCount 5 -wordNgrams 2 -bucket 2000000 -lr, 0.05 -dim 100 -epoch 5 -thread 8".split(" "))

I've tried deleting and regenerating the normalized training data and the model, but the problem persists. Are you able to test this on OS X and JDK 1.8 and still cannot reproduce?

siegebell commented 7 years ago

system info: macOS Sierra; version 10.12.4; 16 GB memory

vinhkhuc commented 7 years ago

@siegebell Yes, I'm using Sierra and Java 8. The following code which calls getWords() works fine for me.

import com.github.jfasttext.JFastText;
public class DebugIssue {
    public static void main(String[] args) {
        JFastText jft = new JFastText();
        jft.runCmd(("supervised " +
                "-input ../cmd/data/dbpedia.train " +
                "-output dbpedia " +
                "-minCount 5 " +
                "-wordNgrams 2 " +
                "-bucket 2000000 " +
                "-lr 0.05 " +
                "-dim 100 " +
                "-epoch 5 " +
                "-thread 8").split(" "));
        jft.loadModel("dbpedia.bin");
        System.out.println(jft.getWords());
    }
}
vinhkhuc commented 7 years ago

I got SIGSEGV if I commented out the line jft.loadModel("dbpedia.bin");. That's expected since in that case the model is not loaded, hence Exception.