Open siegebell opened 7 years ago
Hi siegebell, I tried your command but couldn't reproduce the issue.
Based on the training command line, the output model file should be "model.bin.bin", not "model.bin" (fastText automatically appends the .bin suffix to the output model file). Could you check if you loaded the correct model file?
@vinhkhuc the training command I gave above was in error; it should be:
ft.runCmd("supervised -input dbpedia.train -output dbpedia -minCount 5 -wordNgrams 2 -bucket 2000000 -lr, 0.05 -dim 100 -epoch 5 -thread 8".split(" "))
I've tried deleting and regenerating the normalized training data and the model, but the problem persists. Are you able to test this on OS X and JDK 1.8 and still cannot reproduce?
system info: macOS Sierra; version 10.12.4; 16 GB memory
@siegebell Yes, I'm using Sierra and Java 8. The following code which calls getWords() works fine for me.
import com.github.jfasttext.JFastText;
public class DebugIssue {
public static void main(String[] args) {
JFastText jft = new JFastText();
jft.runCmd(("supervised " +
"-input ../cmd/data/dbpedia.train " +
"-output dbpedia " +
"-minCount 5 " +
"-wordNgrams 2 " +
"-bucket 2000000 " +
"-lr 0.05 " +
"-dim 100 " +
"-epoch 5 " +
"-thread 8").split(" "));
jft.loadModel("dbpedia.bin");
System.out.println(jft.getWords());
}
}
I got SIGSEGV
if I commented out the line jft.loadModel("dbpedia.bin");
. That's expected since in that case the model is not loaded, hence Exception.
Using the example training data (and preprocessing it using the classification-example.sh script that comes with fasttext), I get a SIGSEGV when calling
getWords
after training.Training:
ft.runCmd("supervised -input dbpedia.train -output model.bin -dim 100 -lr 0.05 -wordNgrams 2 -minCount 5 -bucket 2000000 -epoch 5".split(" "))
model.bin
is successfully generated; and if I load it instead of training, there is no crash. I suspect it's running out of memory; but callingunloadModel
beforegetWords
does not help. I tried discarding the trainedJFastText
object and then runningloadModel
, but it seemsmodel.bin
is generated asynchronously so there is no good way to know when to callloadModel
.Crash log: hs_err_pid28676.txt
EDIT: version 0.3 on Mac OSX